Altera_Forum
Honored Contributor
8 years agoNDrange, work-itme level parallelism vs work-group level parallelism
Hello,
I have an ambiguity regarding ND-range. Suppose we have a ND range with 1 Device, 4 CUs (compute units), and 1 PE inside each CU (1 PE means no SIMD). I already know that loop pipelining is disabled in NDrange configurations. Now, consider below 2 recommendations: - Try to use large enough workgroup size to get benefit of multi-threading of many work-items over that single PE. I can guess why, probably the PE is pipelined over work-items (is it right?) and then pipeline is efficiently use if there are many work-items. - Try to use large number of work-groups to get benefit of multiple CU. I really do not understand this. Are n't CUs completely independent? Why when I have multiple CUs, tool recommends this to me? how can be parallelism on work-group levels? Thanks