NDrange, work-itme level parallelism vs work-group level parallelism
Hello, I have an ambiguity regarding ND-range. Suppose we have a ND range with 1 Device, 4 CUs (compute units), and 1 PE inside each CU (1 PE means no SIMD). I already know that loop pipelining ...