Altera_Forum
Honored Contributor
8 years agoReducing memory replication
Hi, I'm working on an OpenCL kernel that is using a 2MB dataset and I've currently been reading in the entire 2MB into on-chip memory, performing the operations (random read/writes) and then outputting the 2MB result back to global memory.
I've had no problems doing this as a single-work-item kernel but when I attempt to parallelize the kernel by adding a for-loop with a# pragma unroll 1 I get a massive blowup in local-memory usage from the tools +--------------------------------------------------------------------+ ; Estimated Resource Usage Summary ; +----------------------------------------+---------------------------+ ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 39% ; ; ALUTs ; 21% ; ; Dedicated logic registers ; 19% ; ; Memory blocks ; 697% ; ; DSP blocks ; 5% ; +----------------------------------------+---------------------------;- Private memory: Potentially inefficient configuration
- Requested size: 2097152 bytes
- Implemented size: 33554432 bytes
- Number of banks: 2 (banked on lowest dimension)
- Bank width: 1024 bits
- Bank depth: 8192 words
- Total replication: 16 - Replicated 16 times to create private copies for simultaneous execution of 16 threads in the loop containing accesses to the array.
- Running memory at 2x clock to support more concurrent ports
- Additional information: Requested size 2097152 bytes, implemented size 33554432 bytes, replicated 16 times total, stallable, 4 reads and 3 writes. - Reduce the number of write accesses or fix banking to make this memory system stall-free. Banking may be improved by using compile-time known indexing on lowest array dimension. - Replicated 16 times to create private copies for simultaneous execution of 16 threads in the loop containing accesses to the array. - Banked on lowest dimension into 2 separate banks. - See best practices guide: local memory (https://www.altera.com/documentation/mwh1391807516407.html#chn1469549457114) for more information.
- Private memory implemented in on-chip block RAM.