Altera_Forum
Honored Contributor
9 years agoHow to launch replicated Single Work-Item Kernels (using num_compute_units)
I am trying to understand how to replicate single work-item kernels (tasks) and especially how to call them.
The programming guide (https://www.altera.com/en_us/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf) says (e.g. on page 2-27, but also other places) that you can specify __attribute__((max_global_work_dim(0))) to enforce a kernel to become a single work-item kernel and __attribute__((num_compute_units(2))) to replicate the engine (two replicas in my example). Taking the fft1d code from the Altera OpenCL design examples, this would mean that I can instantiate two independent FFT engines on the FPGA, which seems straight forward enough. However, what I don't understand is how to launch that kernel such that both replicas are used? The fft1d example performs 2000 ffts, each of size 4096. The 2000 is an input parameter to the fft kernel, which then implements the loop. Now when the kernel is launched with clEnqueueTask(), to my understanding this creates only one work item in one work group and therefore can only run on one of the two FFT engines, right? So how do I have to launch the kernel then such that both engines to half the work (1000 ffts)? I can't do it with clEnqueueTask, because I can't specify how the work is distributed between the engines, and I probably (?) can't use clEqueueNDRange() because it's not an NDRange kernel but a single work-item (task)? Any help is greatly appreciated!