Different kernels of same algorithm give different throughputs

Honored Contributor

8 years ago

Hi HRZ,

Thanks for the answer.

I just need more clarification in one of your points. For single work-item kernels, you said "loop iterations" are being initiated (with some fixed II) into the pipeline, which I completely understand. On the other hand, for ND-Range node the threads are being scheduled and pushed into the pipeline. So if we have a for loop (not unrolled) in and ND-Range mode, then how will it be managed? Does that mean when a thread enters the pipeline and reaches into execution of the loop, then each iteration should be executed after the previous one has completely been finished?

I'm saying this question, since I've realized loop carried data dependency affect the II value and the performance in single work-item mode significantly, but in ND-Range mode I don't see anything. Can you elaborate on the FPGA mapping of combination of having multiple threads and loops?

Thanks,

Saman

Forum Discussion

Different kernels of same algorithm give different throughputs

Recent Discussions

Generate Simulation Setup Script Fails

FIR IP configured for Interpolation

Altera SSLC License

Lisence issue when running .do script

How to create a Packaged Subsystem in TCL