Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- The warning you get is most definitely because of the thread-id-dependent loop bound which will result in threads reaching the barrier in an arbitrary order. Using num_compute_units() will not affect this issue. num_compute_unit will fully replicate the pipeline, allowing the compiler to schedule multiple work-groups in parallel, each in a different compute unit. Altera recommends having at least three times more work-groups, than there are compute units, to be able to fully utilize the circuit. From what I understand, this issue will limit the number of parallel work-groups per compute unit (each region between two barriers in the same compute unit can be occupied by a different work-group), not the total number of parallel wok-groups that are in flight in different compute units, but I could be wrong. --- Quote End --- I have actually tried this out and found that increasing the number of compute units does not allow for 2*num_compute_units like you suggest. Any insight as to why this might be the case? I can also see that the problem would be with the thread id-dependent branching, I will try to address this. Is there a good way to work in the local scope like this without using the thread id to branch?