ContributionsMost RecentMost LikesSolutionsRe: Unexpected Error for the different FOR loop trip count Hi there, Thank you for your reply. Actually, we do not understand why the FPGA openCL did not work while CPU openCL is working fine. So we have changed our code and it's working fine. Thank you. Unexpected Error for the different FOR loop trip count I have been trying to perform panel-by-panel matrix multiplication at the block level. The pseudo algorithm is as follows: __attribute__ ((reqd_work_group_size (1, 1, 1))) __kernel void matmul_panel_fpga_cl ( _global float * a, __global float * b, __global float * c, const int m, const int n, const int k, const int num_of_m_blocks, const int num_of_n_blocks ) for (int a = 0; a <num_of_m_blocks; a + = M_STEP) {// num_of_m_blocks in panelA for (int it = 0; it <M_STEP; it ++) { pack_a_matrix (); } for (int bb = 0; bb <num_of_n_blocks; bb ++) {// num_of_n_blocks in panelB pack_b_matrix (); for (int ab = 0; ab <M_STEP; ab ++) { pack_c_matrix (); packed_matrix_multiply_c_a * b (); return_pack_c (); } } } The above kernels work fine when the number of kernel invocation (in other words number of panels) is equal to the number of num_of_n_blocks. But when they are different then it returns garbage values in the packed_c. I have used the clFinish () every time But I do not understand how this has a relation of num_of_n_blocks to a number of kernel invocations. I have been using OpenCL FPGA SDK 20.3 Please help us to understand.