Double Buffering in OpenCL
Hi All,
I'm trying to adopt the "Double Buffering" technique in one of my OpenCL codes. I have seen in few research papers that double buffering could help boost the performance. Although, they were all using the SDAccel toolchain by Xilinx. Now I want to do the same on Arria 10 FPGAs (Nallatech P385A), using OpenCL.
Here is the kind of code I have:
__local lane_data win_buffer[2][WIN_BUF_SIZE];
for(unsigned int out_idx_xyz=0; out_idx_xyz<(weight_dim4_div_lane*group_num_y*group_num_x); out_idx_xyz++){
flag = out_idx_xyz & 0x01; //ping-pong flag
#pragma ivdep array(win_buffer)
for(unsigned int win_itm_xyz = 0; win_itm_xyz < item_loop_bound; win_itm_xyz++) {
....
if(win_itm_z<weight_dim3/VEC_SIZE){
.....
win_buffer[(~flag)&0x01][win_itm_z*win_size_y*win_size_x + win_itm_y*win_size_x + win_itm_x] = data_vec;
.....
}
if(gp_num_x*CONV_GP_SIZE_X+gp_item_idx_x<conv_x){
......
data_vec = win_buffer[flag][output_idx_dim3*win_size_y*win_size_x + output_idx_dim2*win_size_x + (output_idx_dim1+gp_item_idx_x*stride)];
......
}
......
}
}
As you can see, we have a `win_buffer` which should act as a double buffer. Unfortunately, the compiler detects the load-store to this buffer as a dependency, from the outer-loop perspective. I'm really not sure how we should instruct the compiler to infer a double buffer for the win_buffer.
Does anyone has any specific with respect to this issue?
In case the OpenCL compiler is not mature enough, should I have to split my kernel into two kernels, and somehow manually does this thing?
Thanks