Forum Discussion
Altera_Forum
Honored Contributor
11 years agoSean,
When you say the preferred implementation should follow Fig. 6 (single work-item execution with unrolled inner loop using shift registers) vs. Fig. 7 (an NDRange implementation with outer loop distributed among work-items) why is this the case? Is this a programmability consideration? Or are there performance implications?