Forum Discussion
Altera_Forum
Honored Contributor
8 years agoAre you talking about num_compute_units for NDRange kernels or single work-item kernels? num_compute_units for NDRange kernels works in a fully automatic manner and does not require any user intervention other than adding the attribute to the kernel header. The compiler will automatically replicate the pipeline in this case, allowing multiple work-groups to be scheduled in parallel. This obviously comes at the cost of higher area usage and higher memory bandwidth utilization. If memory bandwidth is saturated, using num_compute_units will actually reduce performance due to extra memory contention.