Forum Discussion
yuguen
Occasional Contributor
4 years agoIf I understand your problem correctly, I would loop over:
1/ having a loop reading a part of DDR and storing the data to two local memories for both of your accessors
2/ computing on these local memories
3/ having a loop writing the two local memories back to DDR.
If you have enough private copies of the local memories, the compiler will schedule 1/ 2/ and 3/ in parallel.
So while you are computing 2/, another part of the DDR is being read and the previously computed local memory is being written to DDR.
Having these local mem, you should never stall because of DDR (assuming your kernel is compute bound).