Forum Discussion
Hi HRZ,
Thanks for the help.
Actually one compute unit run one workgroup of the invoked kernel (say kernel1) at a time. Once it completes the execution of the workgroup, it picks the next workgroup of kernel1 from the command queue and then starts processing that.
The reason I am asking for the multiple physical compute units is that if we get multiple physical compute units on the FPGA, then we can run multiple workgroups of the same kernel in parallel (one compute unit processing one workgroup) to get the parallelism among the different workgroups of the same kernel.
Thanks
- HRZ4 years ago
Frequent Contributor
That is exactly what the num_compute_units(X) attribute does. It will automatically replicate your compute unit "X" times, and distribute work-groups across them. At the same time, even one compute unit can pipeline work-items across different work-groups, so there could be more than one work-group running inside of a single compute unit at each given time. Hence, Intel recommends using 3 times more work-groups than compute units to maximize compute unit utilization. Usage of this attribute is explained here:
Note that this attribute will NOT change CL_DEVICE_MAX_COMPUTE_UNITS, since there is still one physical FPGA on the board.