Forum Discussion
Altera_Forum
Honored Contributor
7 years agoRegarding II and unrolling, in fact you should only use unrolling when you have an II of one. An II of one means there is one loop iteration being processed per clock cycle. When you further unroll the loop by a factor of X in this case, then you will have X iterations being processed per cycle, effectively increasing your performance by a factor of X (if memory bandwidth is not saturated). In cases where II is above II (e.g. due to loop-carried dependencies), then unrolling the loop will increase II by a factor of X, cancelling out the performance improvement from loop unrolling.
Regarding performance improvement with a larger loop trip count; this could be possible if your loop trip count is small compared to the pipeline depth. In this case, the pipeline latency will dominate the run time. However, when the loop trip count is relatively large compared to the pipeline depth, then increasing loop trip count should not affect performance. In your case, based on Altera's report, the pipeline latency is around 60 clocks, which means even your minimum loop trip count of 16384 is large enough to hide the pipeline latency. I believe the performance improvement you are seeing here with higher loop trip count is likely a timing or FLOP/s calculation artifact, rather than a performance artifact. e.g. if you are taking host to device transfer time into account, the higher your loop trip count is, the longer your kernel run time will be and the lower the overhead of the host to device transfer will become; hence, you will get higher performance. Also if your kernel run time is too short, your timing function could be reporting the run time incorrectly, lowering the performance when the loop trip count is small. Make sure your kernels run for a minimum of a few hundred milliseconds. You can find a general performance model for OpenCL on FPGAs here: https://dl.acm.org/citation.cfm?id=3014951 In that paper it is explained how II and loop unrolling affects run time. Though the assumption that pipeline depth increases with the loop unroll factor is incorrect; pipeline depth does increase with loop unrolling, but not as much as the unroll factor.