Forum Discussion
Based on how you have written your code, it seems to me that the same address in global memory could be overwritten multiples times by different threads; e.g. (id=X, i=0) and (id=0, i=X) will always write to global[a]. Since thread ordering is not guaranteed in NDRange kernels, depending on thread ordering, you will receive a different output. This is the source of the store (write after write) dependency. Whether you use ivdep or not, here, the output will still be unpredictable. To be honest, your code looks incorrect to me and will result in unpredictable output regardless of the hardware you use (CPU, GPU, FPGA, etc.), unless "a" is an id-dependent offset that ensures the same address in the global buffer is never overwritten. If the latter is indeed the case, you can safely use ivdep to avoid the false dependency.
Needless to say, this type of dependency does not apply to reading.