Altera_Forum
Honored Contributor
8 years agoParallel accesses to banked local memory
I've been attempting to bank local memory so that I can perform parallel access. I can't seem to get it to work. The compiler always either replicates the memory or generates the memory bank with stalls. I don't understand why the code snippet below doesn't generate a number of parallel BRAMs capable of being accessed concurrently.
# define L 64 __kernel __attribute__((task)) void test( __global char * restrict message, __global char * restrict decodedData ) { local char __attribute__((numbanks(L), bankwidth(1))) msgMem[L][256]; int __attribute((register)) Lrji_row_sum[L]; //store data across L memory banks for(uint k=0; k<256; k++) { for(uint r=0; r<L; r++) { msgMem[r][k]=message[(k*L)+r]; } } mem_fence(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE); //accumulator for each memory bank for(uint i=0; i<256; i++) { # pragma unroll for(uint r=0; r<L; r++) { Lrji_row_sum[r]=msgMem[r][i]; } } for(uint r=0; r<L; r++) { decodedData[r] = Lrji_row_sum[r]; } } I'm using Quartus 17.0.1 Appreciate the help, Jason