Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- - Try to use large enough workgroup size to get benefit of multi-threading of many work-items over that single PE. I can guess why, probably the PE is pipelined over work-items (is it right?) and then pipeline is efficiently use if there are many work-items. --- Quote End --- This is true. --- Quote Start --- - Try to use large number of work-groups to get benefit of multiple CU. I really do not understand this. Are n't CUs completely independent? Why when I have multiple CUs, tool recommends this to me? how can be parallelism on work-group levels? --- Quote End --- This is more or less the same concept as above. Let's say you have a total of six work-groups. The time to process a work-group by a CU is X seconds. In this case, four work-groups will be scheduled into the four available CUs simultaneously. When finished, the remaining two work-groups are scheduled into two CUs, leaving the other two CU unused. In the end, the process will finish after 2X seconds. Now, a basic math tells you that in this case, even if you had only three CUs, run time would still be 2X; hence, you do not get any benefit from the extra CU, since you do not have enough work-groups to fully utilize the CUs all the time. However, if you have a large-enough number of work-groups, having four CUs will be ~33% faster than having three. Note that this is the theoretical case; in practice, performance scaling with multiple CUs also depends on external memory bandwidth and operating frequency.