Forum Discussion
Doesn't your "checkError" function happen to have a clFlush, or clFinish, or clWaitForEvents, etc. that might be serializing some of the enqueues? Other than that, your host code looks fine and the kernel and read operations should get completely overlapped.
P.S. I remember there was this limitation in Intel's runtime and BSP that it wasn't possible to do simultaneous reads and writes through PCI-E, despite PCI-E being a full-duplex medium, which was fixed in some 18.x version of the compiler, and that limitation could cause unnecessary serialization of simultaneous PIC-E reads and writes. However, I don't think that applies to your case since you are trying to overlap kernel execution with PCI-E read and there is no PCI-E write involved, and that fix requires a compatible BSP anyway, which I don't think you have or else you wouldn't be using v17.1. of the compiler.