Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThe compiler combines unrolled consecutive accesses into larger coalesced accesses to improve memory throughput. I am not exactly sure where the LSU sits in the design, most likely between the kernel and memory controller.
Regarding latency, my understanding is that the latency reported in the report for LSUs is the number of registers the compiler inserts into the pipeline to absorb stalls from memory accesses. If the access gets stalled for less clocks, only bubbles will be inserted into the pipeline. If the stall lasts longer, then the whole pipeline will be stalled. Please note that these stuff are not really documented anywhere, and I could as well be wrong.