What is the way to lead the OpenCL SDK compiler to reduce kernel logic utilization ?

Honored Contributor

12 years ago

Often the reqd_work_group_size attribute has an impact on on-chip memory utiliization since the bigger the work-group size typically the larger the on-chip RAM footprint needs to be to handle all those work-items.

For now num_share_resources is the most appropriate attribute to use but it relies on there being identical portions of the compute unit being present. One way to ensure that there is similar functionality is to code auxiliary functions (sub functions). If you call the auxiliary function from multiple places in the kernel then adding the num_share_resources attribute will hopefully cause the compiler to share that function hardware throughout the kernel instead of creating multiple copies (i.e. inlining).

Do you have any expensive operators in your kernel? Some of the higher level trig functions can become fairly big in hardware so perhaps that is something that can be addressed. Also if you have any calculations that are redundant it would make more sense to calculate those on the host not only for a compute time savings but hardware savings as well. For example if you had something like this:

__kernel (......, float n)

{

a[get_global_id(0)] = b[get_global_id(0)] * log(n) * c[get_global_id(0)];

}

Then you should calculate log(n) on the host instead of having each work-item perform the same calculation, and just pass the value in as a kernel argument.

Forum Discussion

What is the way to lead the OpenCL SDK compiler to reduce kernel logic utilization ?

Recent Discussions

Using Quartus with softHSM

The quartus license works with version 25.0 but not with version 17.0

Quartus did not start

Docker image for Quartus Pro 26.1 missing ?

Timing analysis - long combinational path