Forum Discussion
Altera_Forum
Honored Contributor
12 years agoWhat is the NDRange size? This line could (most likely will) be executed by multiple work-items conncurrently : global_offset[addr + 1] = i; Also because of that loop you don't know which order the work-items are going to use that atomic add operator (not that you should assume any ordering when using OpenCL in general) so reading the result after the atomic add and using it to index into memory that way doesn't look safe to me. So I'm suspecting a data hazard is causing the issue you are seeing so I recommend refactoring your code to avoid race conditions like these.
OpenCL handles atomics internally so it doesn't need to rely on the interconnect to provide this. Anywhere you perform an atomic operation in your kernel, special hardware will be placed to ensure no data hazards occur for that particular operation. If you use multiple atomic operations they all operate independent of eachother and as a result may need additional synchronization between them depending on if data is shared.