Forum Discussion
Altera_Forum
Honored Contributor
11 years agoIt is the latter; the programming guide recommends that "the entire kernel should have 4 or fewer different accesses to local memory.
Basically, each load/store instruction in the kernel becomes a client (i.e. master) for local memory. Because local memory has at most 4-ports, if you have 4 load/store instructions, each port will be connected to a single load/store so that load/store instructions will not compete with each other. This guarantees the most efficient hardware. If you have 3 or fewer store instructions and many loads (loads+stores > 4), the compiler may choose to replicate the local memory. This also gives fast accesses at the local memory at the expense of RAMs. Further, the compiler performs some optimizations to partition the local memory based on the access patterns. Hence, even if there are more than 4 store instructions, you may still get efficient hardware. However, this depends on the complexity of your access patterns and may not always be possible.