Forum Discussion
Altera_Forum
Honored Contributor
8 years agoNot all work-groups run fully in parallel on the FPGA. The compiler will decide how many work-groups can run in parallel. The M20K utilization will depend on the number of accesses to the buffer per work-group (which depends on the code and can also be affected by SIMD size), the number of work-groups running in parallel per compute unit (decided by the compiler), and the number of compute units (enforced by the user). The compiler report will explicitly mention why and how many times each local buffer is replicated, and how much the total size will be.