Elongate pipeline voluntarily

Honored Contributor

11 years ago

NDRange kernels will typically hide these stalls by ensuring that many work-items are in flight to fill in the bubbles in the pipeline. The first line in your pseudo code suggests that you are indexing into memory in a non-sequential (or non-predictable) sequence which I think is the source of the problem you are running into. So even thought the kernel scheduler will attempt to keep the pipeline full, the access pattern will most likely prevent the data being read to keep the pipeline busy doing work. OpenCL aside if a master reads from an SDRAM device in a random order you will typically see idle periods in between blocks of read data returning. When SDRAM is accessed sequentially then the read data typically returns in long continuous blocks (i.e. no stalls)

Instead of trying to elongate the pipeline (which I doubt will help nor is it easy to do without knowing how the compiler works) maybe you can describe the size of the data being accessed by the kernel and whether the index used has any predictable pattern and we can try to suggest a way to improve the memory accesses to avoid the issue at the root of the problem. In cases like these I typically attempt to change my algorithm to access memory in a different order or attempt to preload a block of global memory contents sequentially then access the local copy randomly (local memory can be accessed in any order without any performance degradation).

Forum Discussion

Recent Discussions

USB-BlasterII mounts as "USB-Blaster variant"

Duplicate_hierarchy_depth / duplicate_register

Quartus messages web search goes to Intel

how to reduce clock skew between synchronous clock

Quartus - Users getting license Notification with new license applied