Forum Discussion
Altera_Forum
Honored Contributor
12 years agoOften the reqd_work_group_size attribute has an impact on on-chip memory utiliization since the bigger the work-group size typically the larger the on-chip RAM footprint needs to be to handle all those work-items.
For now num_share_resources is the most appropriate attribute to use but it relies on there being identical portions of the compute unit being present. One way to ensure that there is similar functionality is to code auxiliary functions (sub functions). If you call the auxiliary function from multiple places in the kernel then adding the num_share_resources attribute will hopefully cause the compiler to share that function hardware throughout the kernel instead of creating multiple copies (i.e. inlining). Do you have any expensive operators in your kernel? Some of the higher level trig functions can become fairly big in hardware so perhaps that is something that can be addressed. Also if you have any calculations that are redundant it would make more sense to calculate those on the host not only for a compute time savings but hardware savings as well. For example if you had something like this: __kernel (......, float n) { a[get_global_id(0)] = b[get_global_id(0)] * log(n) * c[get_global_id(0)]; } Then you should calculate log(n) on the host instead of having each work-item perform the same calculation, and just pass the value in as a kernel argument.