Forum Discussion

ash_apee's avatar
ash_apee
Icon for New Contributor rankNew Contributor
3 years ago

Unexpected Error for the different FOR loop trip count

I have been trying to perform panel-by-panel matrix multiplication at the block level. The pseudo algorithm is as follows:

__attribute__ ((reqd_work_group_size (1, 1, 1)))
__kernel void matmul_panel_fpga_cl (
_global float * a,
__global float * b,
__global float * c,
const int m,
const int n,
const int k,
const int num_of_m_blocks,
const int num_of_n_blocks
)

for (int a = 0; a <num_of_m_blocks; a + = M_STEP) {// num_of_m_blocks in panelA

for (int it = 0; it <M_STEP; it ++) {

pack_a_matrix ();

}

for (int bb = 0; bb <num_of_n_blocks; bb ++) {// num_of_n_blocks in panelB

pack_b_matrix ();

for (int ab = 0; ab <M_STEP; ab ++) {

pack_c_matrix ();

packed_matrix_multiply_c_a * b ();

return_pack_c ();

}

}

}

The above kernels work fine when the number of kernel invocation (in other words number of panels) is equal to the number of num_of_n_blocks. But when they are different then it returns garbage values ​​in the packed_c. I have used the clFinish () every time But I do not understand how this has a relation of num_of_n_blocks to a number of kernel invocations.

I have been using OpenCL FPGA SDK 20.3

Please help us to understand.

4 Replies