Forum Discussion
Altera_Forum
Honored Contributor
9 years agoAOC does not understand the concept of "single work-group" (not to be confused with "single work-item") kernels, even if you only have get_local_id() and nothing else. By default, it always assumes a certain number of work-groups are going to run in parallel in each compute unit and hence, you always get some extra replication factor for local memory instances (usually 3 but I have seen up to 45!!!) that you might not need. From best practices guide v16.0, page 1-30 (https://www.altera.com/en_us/pdfs/literature/hb/opencl-sdk/archives/ug-aocl-best-practices-guide-16.0.pdf) (removed in 16.1):
--- Quote Start --- Number of simultaneous work-groups is the maximum number of work-groups that the kernel can process at the same time. To increase throughput, the kernel might execute threads from different work-groups simultaneously (that is, that kernel does not wait to fully complete one work-group before starting another work-group). If a kernel can process multiple simultaneous work-groups and has local memory, the size of the local memory must increase to store data from each simultaneous work-group. This local memory replication might increase the usage of block RAM. currently, you do not have the ability to modify the number of simultaneous work-groups directly. --- Quote End --- I don't think there is any work-around for this (at least not a public one) but you could probably open a service ticket directly with Altera and ask them.