Forum Discussion
Altera_Forum
Honored Contributor
8 years agoWhen I use NDRange kernel,
In Kernel code if I have get_global_id then report will show it's a NDRange kernel, so won't be pipelined. is there any way to pipelined NDRange kernel? using compute unit will help? and How can I measure stall time? I have a (8,256,256) NDRange kernel, and local group size (1,64,64) but the computation performance is very low when I try to increase local size more. I think it's because in 64x64 work items, each work item have to wait until all work items in same group finished, then the other 64x64 work items can be launched. Is that correct? and how to measure the time they wait?