Regarding the channel reordering, I think I now understand that the compiler always detaches channel operations from other read/write operations and uses extra registers (register renaming?) to handle dependencies such as the one discussed here which makes sense. Hence, it this case, if a cycle of channels did not exist, the channel operations in the "receive" kernel would still have been reordered, but no data corruption would have happened because the dependency is handled using extra registers. However, due to the cycle of channels and the channel reordering, a deadlock happens at run-time unless channel ordering is enforced using mem_fence.
Still, since I also thought all this time that channel reordering will not happen when data dependencies are involved, I would say the relationship between channel ordering and data dependencies could be very confusing for people who do not come across this thread and it is probably best if it is explained somewhere in the documentation.