Forum Discussion
Altera_Forum
Honored Contributor
8 years ago1) No. You should try vectorization (num_simd_work_items) first before CU replication (num_compute_units). Both use more resources, but num_simd_work_items will use less.
3) The penalty of syncing work-items before performing a memory access is much less than constant calls to global. Again, check the optimization report and use the profiler to see the affects on your design. 4) All pipeline hardware is created with the offline compile so the choice of load/store units is done at that point as well, including whether coalescing can be performed or not.