Forum Discussion
Channels are pretty much never a source of bottleneck. Channel stalls pretty much always are stalls propagated from the top of the pipeline, very likely global memory operations. Since the operations are pipelined, it doesn't matter what the start cycle of the operation is since the start latency will be amortized if the pipleine is kept busy for long enough. Stall rate for channel reads and writes do not need to be balanced. In fact, they definitely won't be balanced if the source of the stall is somewhere else. e.g. if the source of the stall is globall memory, channels writes will never stall sine the channel never becomes full, but channel reads will stall since the channel is faster than globabl memory and it becomes empty frequently, waiting for data to arrive from global memory. The behavior you are seeing is completely normal. The document I posted above also describes how to find the source of stalls in a kernel.