Forum Discussion
Altera_Forum
Honored Contributor
12 years agoGPUs schedule work-items in batches called warps or wavefronts depending on the vendor. Sometimes this scheduling masks synchronization issues. That said, after looking at your code again and the behavior you are seeing when you remove the 'global_offset[addr + 1] = i;' this doesn't look like a synchronization issue (my mistake, was a little lost about what your code was attempting to do the first time I looked at it). I recommend opening a service request and attaching this kernel and the host application so that Altera can take a look at this.
If two work-items from different work groups access the same location using an atomic operator the OpenCL hardware is supposed to take care of that. It would be interesting to see if atomic_inc works in the problematic scenerio.