#pragma ivdep not allowing for parallel stores to local memory

Honored Contributor

8 years ago

#pragma ivdep is designed to avoid false load/store dependencies on accesses to global memory that are caused due to certain information being only available in the host code and not being available to the kernel compiler. I have never encountered any case in which this pragma had any effect on load/store dependencies on local buffers. In my experience, the compiler never makes a mistake in detecting such dependencies on local buffers and hence, trying to avoid such dependencies will likely result in incorrect output.

In your case, it is not very easy to make a judgment without seeing the whole code. However, from what I can see, you are using indirect addressing to the local buffer and the compiler cannot know that these indirect accesses will not overlap and hence, has to force sequential reads and writes. In single work-item kernels, this overhead is generally unavoidable unless you change your design strategy. If you are certain that these addresses never overlap, there must be another method that you can use to avoid the indirect addressing; however, if there is no way to avoid it, I think you might be able to get better performance with NDRange kernels because at least in that case the scheduler will try to maximizing pipeline utilization at runtime by reordering the threads, rather than forcing full sequential operation.

I am not sure if it applies to your case but if FIFO-based synchronization can help you, you can always use the channels extension.

Forum Discussion

#pragma ivdep not allowing for parallel stores to local memory

Recent Discussions

Duplicate_hierarchy_depth / duplicate_register

how to reduce clock skew between synchronous clock

Quartus - Users getting license Notification with new license applied

Quartus messages web search goes to Intel

Is Quartus Prime Pro 22.4 Compatible with Stratix 10 NX Series Device 1SN21CEU2F55E2VG?