Forum Discussion
From what I understand, you are trying to use a channel to synchronize two kernels that share a global buffer, by sending tokens from one to the other whenever the data that is to be read, has been written. I personally tried doing this once, it didn't work (data got corrupted). When you write the data to the off-chip buffer and then write the token to the channel, since the latency of channel transfers is much lower than the off-chip memory transaction, the other kernel will likely get the token from the channel and read the memory location before it has actually been written to, and this will corrupt your data. I am not sure about atomic read/writes, though. The important point to note here is that the memory and the channel operation are all in the same pipeline; when the channel operation starts, it does NOT mean the memory operation that was before it has finished.
You can also take a look at "Intel FPGA SDK for OpenCL - Programming Guide: Defining Memory Consistency Across Kernels When Using Channels". I tried to use the mem_fence(CLK_GLOBAL_MEM_FENCE) to mitigate the above issue but that didn't work either. Then again the description in that part of the document is not very clear and that mem_fence is probably not designed for the purpose I was trying to use it for.