In my openCl code when ever I am reading something from global memory I am getting stall rate of around 30%. I am using float16 to read 16 continuous memory location. Can anyone suggest me way to re...
Is this the same code as the one in your other thread? The read access is 512-bit wide which means it can efficiently saturate the bandwidth of one memory bank. However, the write is narrow and unless you have multiple consecutive writes that are coalesced into one wider access, write performance is going to be poor. You should check the HTML report to see how the memory ports are instantiated by the compiler. Since, by default, the compiler interleaves buffers if your board has two memory banks, your reads and writes can actually conflict. But if you disable memory interleaving as mentioned in the Programming Guide and put the input and output buffers on separate memory banks, then the accesses will not conflict with each other.
Yes that is the same code as in my other thread. If I check my board specification, I have only one DDR4 memory bank. I can see in the report generated that multiple ports are generated for the storing the data, as you can see in the attached photo. I thought that even having one memory bank there will be separate read and right ports that means there should not be any memory contention is that not correct.
I am although writing to consecutive write port they are not coalesced , is it correct to say?. If yes then having one like write float8 or float16 and one read of float16 could be the best case given I have one memory bank ?