Forum Discussion
Is this with emulation or actual FPGA execution? Printf data is cached in Block RAMs and dumped after kernel execution on the actual FPGA (while it is dumped right away in the emulator) and hence, if the kernel runs into a deadlock, the printf output will never be displayed (printf is useless for debugging deadlocks on the FPGA). Other than that, I don't see any reason why your code above would deadlock unless there is some problem in your host code; e.g. one kernel is waiting for an event from the other or there is a clFinish on the producer kernel before the consumer kernel is launched. Your description very well matches a synchronization problem in the host code rather than the kernels.
Thanks, HRZ.
I just tested this with emulator (with the option "-emulator-channel-depth-model=strict"). My host code is very simple and no kernels are waiting for any events. I show the key code as follows (you can see I put Producer and Consumer into two different command queues, respectively). It is really weird.
#define MAX_CMD_QUEUES 4
#define N_KERNEL 2
void run_kernel(cl_command_queue cmd_queue[MAX_CMD_QUEUES], cl_kernel (&kernel)[N_KERNEL], size_t n_thread)
{
size_t global_work_size[1] = {(size_t)n_thread};
size_t local_work_size[1] = {(size_t)n_thread};
cl_event event_write[1], event_exec[2];
cl_int status;
cl_command_queue &rcmd_queue0 = cmd_queue[0], &rcmd_queue1 = cmd_queue[1], &rcmd_queue2 = cmd_queue[2];
// Write data into buf_in from h_in
status = clEnqueueWriteBuffer(rcmd_queue0, buf_in, CL_TRUE, 0, bufsize, h_in, 0, NULL, event_write);
error_check(status, "Write buf_in failed!\n");
// Launch the kernel Producer
status = clEnqueueNDRangeKernel(rcmd_queue0, kernel[0], 1, NULL, global_work_size, local_work_size, 1, event_write, NULL);
error_check(status, "Run kernel Producer error!\n");
// Launch the kernel Consumer
status = clEnqueueNDRangeKernel(rcmd_queue1, kernel[1], 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);
error_check(status, "Run kernel Consumer error!\n");
// Read results back to h_out from buf_out
read_back_results(rcmd_queue1, bufsize, buf_out, (char*)h_out, NULL);
clReleaseEvent(event_write[0]);
clReleaseEvent(event_exec[0]);
clReleaseEvent(event_exec[1]);
}