Altera_Forum
Honored Contributor
8 years agolocal memory bank
I have read best practice guide, but I am still confused.
I have optimize the local memory to 1 read and 1 write. However, the report.html report that the"w_local" memory use 64 RAM blocks. I know the multiply unroll 64 times, so I need to get 64 datas(64*16=1024 bits) in 1 clock, but since the local memory optimize to 1 read and each read read 1024 bits, therefor I use only 1 RAM block not 64 RAM blocks, right?
typedef struct{
short ff;
} filter_trans;
typedef struct{
filter_trans ww;
} data_trans;
typedef struct{
filter_trans ww;
} weight_trans;
__kernel(){
weight_trans w_local;
data_trans data_in = read_channel_intel(data_ch);
cont control = read_channel_intel(cont_ch);
weight_trans get_w = w_local;
# pragma unroll
for(int n=0; n<4; n++){
winograd = 0;
# pragma unroll
for(int j=0; j<16; j++){
winograd += get_w.ww.ff * data_in.ww.ff;
}
}
}
"w_local" Private memory: Optimal Requested size: 73728 bytes Implemented size: 131072 bytes Number of banks: 1 Bank width: 1024 bits Bank depth: 1024 words Total replication: 1 Additional information: Requested size 73728 bytes, implemented size 131072 bytes, stall-free, 1 read and 1 write. - See Best Practices Guide: Local Memory for more information. Private memory implemented in on-chip block RAM.