Forum Discussion
Altera_Forum
Honored Contributor
8 years agoWell, either way, your loop on "m" is unpipelineable the way it is. You should either merge the loops on "m" and "n" into one loop, or convert your kernel to NDRange with both "m" and "n" absorbed into the thread dimensions, and let the runtime scheduler handle the job of minimizing the pipeline stalls/bubbles.