Hi HRZ, here is the report of one of my local memory:
conv.cl:149 (data):
Local memory: Potentially inefficient configuration.
Requested size 65536 bytes (rounded up to nearest power of 2), implemented size 458752 bytes, replicated 7 times total, stallable, 64 reads and 1 write. Additional information:
- Reduce the number of write accesses or fix banking to make this memory system stall-free. Banking may be improved by using compile-time known indexing on lowest array dimension.
- Replicated 7 times to create private copies for simultaneous execution of 7 threads in the loop containing accesses to the array.
- Banked on lowest dimension into 64 separate banks (this is a good thing).
I don't understand what's the seven threads.