Forum Discussion
Private memory is generally implemented using registers and does not use Block RAMs. Large local memory buffers are implemented using Block RAMs. CU increases Block RAM usage with the CU factor since it replicates the whole pipeline. The effect of SIMD is not straightforward. If accesses to your local buffer are coalesced under the presence of SIMD, which means the number of ports to that buffer does not change, then replication factor stays the same and Block RAM utilization hardly changes. If, however, such accesses are not consecutive and cannot be coalesced, then using SIMD which increase the number of ports by the SIMD factor and can significantly increase the replication factor. Block RAM replication factor depends on the number of barriers in the kernel, number of accesses to the buffer, and number of work-groups the compiler decides to run simultaneously. The latter cannot be directly controlled by the user. Some attributes are provides to control banking and number of ports for local memory buffers. You can find the details in "Intel FPGA SDK for OpenCL Best Practices Guide, Section 7.5 - Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor".