User Profile
User Widgets
Contributions
Unexpected Error for the different FOR loop trip count
I have been trying to perform panel-by-panel matrix multiplication at the block level. The pseudo algorithm is as follows: __attribute__ ((reqd_work_group_size (1, 1, 1))) __kernel void matmul_panel_fpga_cl ( _global float * a, __global float * b, __global float * c, const int m, const int n, const int k, const int num_of_m_blocks, const int num_of_n_blocks ) for (int a = 0; a <num_of_m_blocks; a + = M_STEP) {// num_of_m_blocks in panelA for (int it = 0; it <M_STEP; it ++) { pack_a_matrix (); } for (int bb = 0; bb <num_of_n_blocks; bb ++) {// num_of_n_blocks in panelB pack_b_matrix (); for (int ab = 0; ab <M_STEP; ab ++) { pack_c_matrix (); packed_matrix_multiply_c_a * b (); return_pack_c (); } } } The above kernels work fine when the number of kernel invocation (in other words number of panels) is equal to the number of num_of_n_blocks. But when they are different then it returns garbage values in the packed_c. I have used the clFinish () every time But I do not understand how this has a relation of num_of_n_blocks to a number of kernel invocations. I have been using OpenCL FPGA SDK 20.3 Please help us to understand.1.2KViews0likes4Comments