Forum Discussion
Can you also post your kernel header/variable declarations? I am not sure which buffer is local and which is global. Note that the system viewer tab in the HTML report provides invaluable details regarding the memory ports and their length which can help considerably in debugging memory performance.
- ADua06 years ago
Occasional Contributor
Here are my header declarations
- HRZ6 years ago
Frequent Contributor
Unless I am missing something here, you should get one 512-bit read port for input and one 512-bit write port for output and one 128-bit port for mask. Assuming that your board has two banks of DDR3/4 memory, this should give you relatively good memory throughput. Can you check the system viewer tab in the HTML report to make sure the compiler is correctly coalescing the accesses? You can also archive the "reports" folder and post it here and I will take a look at it.
P.S. Can you post your board model?
- ADua06 years ago
Occasional Contributor
I am getting one 512 bit read port with one 512 with mask port as I am unrolling by the factor of 3x3(inner and outer) so reading 32*9 bits. So it generated one 512 bits for that. Also for the output because I unrolled by the factor of 14 so I am getting 14 512 bit ports for the output , with burst coalesced write-ack non aligned which I am assuming does the coalesced access , is that true to assume ?
The profiler gives low occupancy and utilization for the output. And the burst size I am able to get is with 1 with the possibility of 16 for both read and write port. Do you know how burst read and write is controlled or any suggestions how can I improve that, I think that could also be the way to improve the results
My board is arria 10 with one DDR4.
I have also attached my reports to this , you can take a look at it