Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- @Jasmine-J, assuming that you are programming both the CPU and FPGA using OpenCL, you can create two separate queues, one for each device, and run your kernels in parallel on the different queues and use events to synchronize them. Either way, clEnqueueNDRangeKernel() is NOT a blocking call and the best way to synchronize kernels or code segments that are supposed to run in parallel is to use events. --- Quote End --- Thanks HRZ, I did not use OpenCL for CPU, I used thread in C++11. Now I just want to , after clEnqueueNDRangeKernel() for FPGA, let CPU start its thread, WITHOUT synchronizing... And if I use clFinish, I think, it will WAIT to be synchronized. And without clFinish, it will NOT WAIT. But I can not understand why the execution time with clFinish is shorter than the time without clFinish... In other words, I just want the total execution time of CPU and FPGA is not simply the sum of CPU execution time and FPGA execution time... Thanks again.