can single work items kernels run in parallel on same device

Honored Contributor

8 years ago

Any type of kernel can run in parallel with another, as long as they are invoked in a separate queue, and no event is used to forcibly sentimentalize them; the key point here is that they must run in a different queue and you should not force the host to wait for each kernel execution separately using commands like clFlush() or clFinish(), or by waiting on events. You can, and probably should, wait for an event associated with each kernel invocation, or use clFinish() on every single queue you have, after invoking all the kernels in the host, to make sure all kernels have finished execution, to be then able to read the data back from the device.

Another way this can be accomplished more efficiently is to use replicated autorun kernels; more details about this are available in "Intel FPGA SDK for OpenCL Programming Guide, Section 11.4".

Finally, I need to emphasize on the fact that since external memory bandwidth is shared between the kernels running in parallel, you should not expect to get linear speed-up by using multiple parallel kernels. In fact, assuming that one of your kernels is memory-bound on its own, you will not see any speed-up at all by replicating it.

P.S. I have done this multiple times, and it certainly works.

Forum Discussion

can single work items kernels run in parallel on same device

Recent Discussions

Design Space Explorer - *** Fatal Error: Access Violation at 0X000000001E19EB30

Tensor block usage

Error (169008): Can't turn on open-drain option for differential I/O pin HPS_DDR3_DQS_N[1]

Highlight similar instances of a selected word fails when scrolling

Warning at Standard 25.1 by Arria 10