Forum Discussion
Altera_Forum
Honored Contributor
8 years agoHi HRZ,
Thanks for the answer. I just need more clarification in one of your points. For single work-item kernels, you said "loop iterations" are being initiated (with some fixed II) into the pipeline, which I completely understand. On the other hand, for ND-Range node the threads are being scheduled and pushed into the pipeline. So if we have a for loop (not unrolled) in and ND-Range mode, then how will it be managed? Does that mean when a thread enters the pipeline and reaches into execution of the loop, then each iteration should be executed after the previous one has completely been finished? I'm saying this question, since I've realized loop carried data dependency affect the II value and the performance in single work-item mode significantly, but in ND-Range mode I don't see anything. Can you elaborate on the FPGA mapping of combination of having multiple threads and loops? Thanks, Saman