Forum Discussion
Altera_Forum
Honored Contributor
11 years agoThe throughput gain is not coming from the "parallel" execution of work-items on different "processing units". Yes, work-items are executing in parallel in different compute units (if num_compute_units specified) and within the same compute unit (if num_simd_work_items specified). However, throughput gain is mainly coming from the pipeline execution of work-items. Let's say if you have 32 work-items, it will take 32-cycles (ideally) to issue all the work-items to one compute unit, and let's say the kernel computation takes 1000 cycles, after 1000 cycles, one work-item will complete every cycle. Essentially, these 32 work-items execute one-cycle after each other, but there is more than enough work to accomodate all in the compute unit.