Forum Discussion
The warning you get is most definitely because of the thread-id-dependent loop bound which will result in threads reaching the barrier in an arbitrary order. Using num_compute_units() will not affect this issue. num_compute_unit will fully replicate the pipeline, allowing the compiler to schedule multiple work-groups in parallel, each in a different compute unit. Altera recommends having at least three times more work-groups, than there are compute units, to be able to fully utilize the circuit. From what I understand, this issue will limit the number of parallel work-groups per compute unit (each region between two barriers in the same compute unit can be occupied by a different work-group), not the total number of parallel wok-groups that are in flight in different compute units, but I could be wrong.