Forum Discussion
Altera_Forum
Honored Contributor
7 years ago1- Every single access to global memory in any part of the code will have its own port to external memory. Such ports are never shared between different accesses, unless they can be coalesced at "compile time".
2- None of those accesses will be "large burst accesses", they will all be 32-bit accesses. You might think that since your accesses are consecutive and in a loop, they might be coalesced into a larger access at runtime, but this will not happen, since the memory interface does not perform runtime coalescing. The only way to achieve higher memory performance is to perform compile-time coalescing by unrolling the loop over the consecutive access. The "System viewer" in the area report shows all the ports, their size and type. If you unroll the outer loop, you will get four 32-bit accesses, while if you unroll the inner loop, you will get one 128-bit access. The latter will give you far much better memory performance (but an II of 16 due to the floating-point accumulation which can be fixed using shift register inference).