ND-Range kernels vs SingleThread?
Dear All,
I got a question with regard to choosing between ND-Range and Single Thread kernels. Intel FPGA best practices stresses out that it is always preferred to choose Single Thread model kernels than ND-Range. I have already tried several real and synthetic kernels. Also many real applications. I can claim that in 95% of the situations, NDRange kernels perform much faster compared to Single Thread mode. Looking at the compiler report, I can see that single thread modes kernels are scheduled with Initiation Interval of 1. Even the operational frequencies are not that much different. Besides, I also make sure memory access patterns are defined to be able to fully coalesced.
I'm using SDK 16.0 with a Nallatech p385a FPGA card. I'm really hustling to explain the huge performance difference, and even convince myself that what the Intel documentation claims is not totally true.
So my question is, does dynamic interleaving of threads in ND-Range mode is much more powerful compared to static scheduling of single thread kernels? Also, is it ok to claim that kernel with a single level of parallelism are always better to be developed using NDRange mode, and kernels with no parallelism and high degree of dependencies are good for single thread? In addition, is there any scenario that a kernel can be written as both ND-Range and SingleThread, while SingleThread outperforms the other?
I do appreciate if you clarify what are the reasons for my observations.
Best,
Saman