num_compute_units effect on concurrent workgroups

Honored Contributor

8 years ago

--- Quote Start ---

The warning you get is most definitely because of the thread-id-dependent loop bound which will result in threads reaching the barrier in an arbitrary order. Using num_compute_units() will not affect this issue. num_compute_unit will fully replicate the pipeline, allowing the compiler to schedule multiple work-groups in parallel, each in a different compute unit. Altera recommends having at least three times more work-groups, than there are compute units, to be able to fully utilize the circuit. From what I understand, this issue will limit the number of parallel work-groups per compute unit (each region between two barriers in the same compute unit can be occupied by a different work-group), not the total number of parallel wok-groups that are in flight in different compute units, but I could be wrong.

--- Quote End ---

I have actually tried this out and found that increasing the number of compute units does not allow for 2*num_compute_units like you suggest. Any insight as to why this might be the case?

I can also see that the problem would be with the thread id-dependent branching, I will try to address this. Is there a good way to work in the local scope like this without using the thread id to branch?

Forum Discussion

num_compute_units effect on concurrent workgroups

Recent Discussions

starting to learn FPGAs

qsys-generate outputs Info as Error

Timing analysis - long combinational path

Quartus Prime Lite 25.1 License Error - "Unable to checkout a license" (SALT_LICENSE_SERVER)

Regarding the issue of UFM not starting