Help with porting a CUDA code to OpenCL

Honored Contributor

8 years ago

1) You can create multiple queues and have multiple kernels or buffer copy calls running in parallel, and then synchronize them using events. Note that global memory consistency is only guaranteed at the end of kernel execution and hence, if you try to share a read_write buffer between multiple queues, you will get undefined behavior, unless you synchronize the calls using events.

2) You can have as many kernels as you want in each queue. Kernel execution calls are non-blocking, and kernels will be queued on the device and executed in-order. You can put the two different parts of your code in two different kernels and run them in-order.

3) I would remove all GPU-based optimizations from the code and convert the kernel to a sequential C code and compile it as a single work-item kernel.

You probably need to spend a good deal of time reading Khronos's OpenCL documents and also Altera's OpenCL documents (getting started guide, programming guide, best practices guide). Don't concern yourself with what version of OpenCL is supported on the FPGA, you should not need to use the features that only exist in newer versions just yet.

Forum Discussion

Recent Discussions

Timing analysis - long combinational path

Docker image for Quartus Pro 26.1 missing ?

Error (292014): Can't find valid feature line for core SLL_CA_HBC_T001_Hyperbus_Memory_Controller_10

Agilex 5 – Critical HSSI Error in JESD204B Example Design

The quartus license works with version 25.0 but not with version 17.0