Forum Discussion
Let me emphasize that the cache is not kernel-wide, but actually access-wide (at least based on what is written in the area report). Hence, the cache from one access in one kernel is not shared with the cache from another access in the same kernel, even if they are both to/from the same global buffer. If all your writers that might write data to the same global memory location always write the same value, I don't see how even the existence of a cache would cause a problem in this case since the cache data will also be the same. Either way, you can disable the cache by [falsely] marking your input buffers as volatile. You probably do not need to remove restrict, though; if you do that, you will likely get 100% sequential execution over the whole kernel.