Forum Discussion
Altera_Forum
Honored Contributor
7 years agoThere are two version of "num_compute_units()"; one is num_compute_units(X) (one dimensional) which is for ndrange kernels, the other one is num_compute_units(X, Y, Z) (three-dimensional) which is for single work-item autorun kernels that need to be coupled with get_compute_id(). The former is the one that is referenced in the forum thread you mentioned (note that I was talking about single work-group and not single work-item kernels in that thread) and the latter is the one that is mentioned in Altera's documents. There is no way to automatically replicate non-autorun single work-item kernels.
However, in your case, why don't you just partially unroll your loop? Loop unrolling is the most area-efficient way of improving performance.