Forum Discussion
The thread capacity in the report is basically just the length of the pipeline for that specific block and shows the maximum number of threads that can be in-flight simultaneously in that block. It doesn't mean there are going to be so many threads in flight nor are local memory buffers replicated by such factor. Local memory replication factor depends on the number of reads and write accesses from/to the local memory block and the number of work-groups the compiler decides to run simultaneously. Apparently the latter is the total length of all the pipelines in the kernel divided by the work-group size, which sometimes ends up being an absurd number in the order of 10s or even one hundred. If you post the info the report gives you regarding why and how many times your local memory buffers are replicated, it would be easier to find a way to reduce it.