Hello all, I have finally gotten some code I have been working on to compile, specifically after installing the April update to the OpenCL compiler. However, the Compile has been going for almost a week now, is there something wrong or is a long compile time possible for complex code? Sincerely, Aaron S.

When exactly, which task (analyse, fitter...) does it spend very much time ? You have also Quartus Advisor in menu Quartus > Tools > .... > compile time advisor where you get advices and where you can "fix" settings.

It is during the OpenCL compile that I have the problem, I am only running aoc -c -v etc. Sincerely, Aaron S.

Double check your kernel to make sure there are not any coding structures that could cause the compiler problems (like recursion). If you have any of the performance attributes or the unroll pragma defined in your kernel try scaling those back. The only time I have run into the issue you are seeing is when had an incredibly deep dependency tree in my kernel that became much worst when I unrolled a long loop.

Thank you for your reply, I guess I have 2 questions. 1) Is the compile in an infinite loop? 2) Without unrolling all of the loops the logic utilization was above 100% so is there a way to balance the two issues, loop unrolls and logic utilization? Sincerely, Aaron S.

The compiler will probably finish eventually but I always end up killing it because if it takes that long to generate the logic then I Quartus will take a while to synthesize the logic. I don't really have an answer for# 2 because it's very kernel specific and each way you tackle the problem is different. If you can share kernel through a service request or FAE I'm sure Altera can take a look and make suggestions. One thing to look into is whether your kernel is sequential in nature. For example the vector_add kernel posted on the design examples page on www.altera.com (http://www.altera.com) could easily be written as a task kernel like this: __attribute ((task)) __kernel void vectorAdd(__global const float * restrict x, __global const float * restrict y, __global float * restrict z, int N) { // iterate over elements of the arrays instead of indexing using work-item ID for(int index = 0; index < N; index++) { // add the vector elements z = x + y; } } Since a task kernel only operates over a single work-item often the hardware footprint is much smaller. Without seeing your kernel I'm not sure if this would help or not though.

Long compile time (infinte loop) | Altera Community

13 Replies

Altera_Forum
Honored Contributor
11 years ago
When exactly, which task (analyse, fitter...) does it spend very much time ?

You have also Quartus Advisor in menu Quartus > Tools > .... > compile time advisor where you get advices and where you can "fix" settings.
Altera_Forum
Honored Contributor
11 years ago
It is during the OpenCL compile that I have the problem, I am only running aoc -c -v etc.

Sincerely,
Aaron S.
Altera_Forum
Honored Contributor
11 years ago
Double check your kernel to make sure there are not any coding structures that could cause the compiler problems (like recursion). If you have any of the performance attributes or the unroll pragma defined in your kernel try scaling those back. The only time I have run into the issue you are seeing is when had an incredibly deep dependency tree in my kernel that became much worst when I unrolled a long loop.
Altera_Forum
Honored Contributor
11 years ago
Thank you for your reply, I guess I have 2 questions.

1) Is the compile in an infinite loop?
2) Without unrolling all of the loops the logic utilization was above 100% so is there a way to balance the two issues, loop unrolls and logic utilization?

Sincerely,
Aaron S.
Altera_Forum
Honored Contributor
11 years ago
The compiler will probably finish eventually but I always end up killing it because if it takes that long to generate the logic then I Quartus will take a while to synthesize the logic.

I don't really have an answer for# 2 because it's very kernel specific and each way you tackle the problem is different. If you can share kernel through a service request or FAE I'm sure Altera can take a look and make suggestions.

One thing to look into is whether your kernel is sequential in nature. For example the vector_add kernel posted on the design examples page on www.altera.com (http://www.altera.com) could easily be written as a task kernel like this:

__attribute ((task)) __kernel void vectorAdd(__global const float * restrict x, __global const float * restrict y, __global float * restrict z, int N) { // iterate over elements of the arrays instead of indexing using work-item ID for(int index = 0; index < N; index++) { // add the vector elements z = x + y; } }

Since a task kernel only operates over a single work-item often the hardware footprint is much smaller. Without seeing your kernel I'm not sure if this would help or not though.
Altera_Forum
Honored Contributor
11 years ago
I do not know if this is the exact issue here, but the previous versions of the aoc compiler was having trouble trying to schedule instructions when there are large number of load and store instructions (hundreds or thousands or loads/stores). There could be multiple reasons for having excessive number of loads/stores; 1) loop unrolling, 2) a large private array being accessed with dynamic addresses.

The compile-time issue is fixed in 14.0 update 1 release. However, in general, users should try to avoid having excessive number of load/store instructions for performance reasons.
Altera_Forum
Honored Contributor
11 years ago
I think I'm running into the "long compile times due to large loops being unrolled issue." I have a task kernel and synthesis does not complete in a reasonable amount of time if I unroll over 320. When can we expect 14.0 update 1?

"However, in general, users should try to avoid having excessive number of load/store instructions for performance reasons."

If I'm doing a large parallel load, what are my other options? Should I create an I/O kernel that writes to a channel instead?
Altera_Forum
Honored Contributor
11 years ago
--- Quote Start ---
... compile-time issue is fixed in 14.0 update 1 release. ...
--- Quote End ---

When is this update available? I'm having similar issues as aws4y (I think). All of the board diagnostics are working fine; I'm running Windows Visual Studio 2013 Professional, my environment is setup using
> vcvars64.bat

but at the moment all of my kernels are hanging,

e.g., >aoc -c -v hello_world.cl

followed by

>aoc -v hello_world.aoco

hangs and so does the vector-addition example. To help troubleshoot, I re-installed Visual Studio, the Altera Quartus software, the BSP from Bittware, etc., along with numerous restarts, all of the diagonstics work but I am unable to produce a working *.aocx file.

Other ideas / suggestions?

Thanks.
Altera_Forum
Honored Contributor
11 years ago
According to my calendar, the update release should be available now.

I am still assuming you are talking about "aoc -c" compile time and your parallel load is from global memory. If so, make sure your access pattern is simple. If you are accessing consecutive elements in global buffer and unroll the loop, the compiler will merge your accesses to form wider accesses which reduces the number of loads.

You can also try sending the data to the kernel via channel as you suggested. This may increase resource usage due to channels, but may make compile-time faster because you are splitting your memory accesses in two kernels.

--- Quote Start ---
I think I'm running into the "long compile times due to large loops being unrolled issue." I have a task kernel and synthesis does not complete in a reasonable amount of time if I unroll over 320. When can we expect 14.0 update 1?

"However, in general, users should try to avoid having excessive number of load/store instructions for performance reasons."

If I'm doing a large parallel load, what are my other options? Should I create an I/O kernel that writes to a channel instead?
--- Quote End ---
Altera_Forum
Honored Contributor
11 years ago
--- Quote Start ---
When is this update available? I'm having similar issues as aws4y (I think). All of the board diagnostics are working fine; I'm running Windows Visual Studio 2013 Professional, my environment is setup using
> vcvars64.bat

but at the moment all of my kernels are hanging,

....
--- Quote End ---

Ok here is an update: I tracked this down to a license file issue. I simply created another license file, downloaded the file, and then updated my License File Environment Variable; now the compilation seems to be working correctly. The Hello World example just completed and executed correctly.

However, the -march=emulation option still does not work correctly, I'll post a separate report on that.

Forum Discussion

Long compile time (infinte loop)

13 Replies

Recent Discussions

Regarding the issue of UFM not starting

ram retiming

Reset Release IP for Agilex needs Stratix 10 device files installed!

Licensing ‘Know-How’ Guide

Timing analysis - long combinational path