Forum Discussion
yuguen
Occasional Contributor
4 years agoFrom what I can see in this code snippet, your inner loop reads and writes to DDR + does blocking pipes operations: therefore it needs to be in a stall-enabled cluster as both the DDR and the pipes may stall your kernel.
If you want to have a stall-free compute loop, you'll want to remove both the DDR accesses and the pipe operations.
If I understand correctly your issue, what stalls your compute kernel are the memory accesses and not the pipe operations?
In that case you may want to copy the relevant data you want to compute on in a local memory, make your compute kernel compute on that local memory and produce its results to another local memory. The results local memory can then be copied back to DDR.