Forum Discussion
Altera_Forum
Honored Contributor
8 years ago1) You can create multiple queues and have multiple kernels or buffer copy calls running in parallel, and then synchronize them using events. Note that global memory consistency is only guaranteed at the end of kernel execution and hence, if you try to share a read_write buffer between multiple queues, you will get undefined behavior, unless you synchronize the calls using events.
2) You can have as many kernels as you want in each queue. Kernel execution calls are non-blocking, and kernels will be queued on the device and executed in-order. You can put the two different parts of your code in two different kernels and run them in-order. 3) I would remove all GPU-based optimizations from the code and convert the kernel to a sequential C code and compile it as a single work-item kernel. You probably need to spend a good deal of time reading Khronos's OpenCL documents and also Altera's OpenCL documents (getting started guide, programming guide, best practices guide). Don't concern yourself with what version of OpenCL is supported on the FPGA, you should not need to use the features that only exist in newer versions just yet.