Forum Discussion
1. Reading from or writing to the same channel in a for loop, unless the for loop is unrolled, will just create one "call site"; after all, there is only one iteration from the loop accessing the channel in each clock and only one read/write port will exist. Why do you think this creates multiple call sites? Other than the first example in page 28 where the documentation shows what type of channel usage will results in compilation failure due to multiple call sites, no other example with multiple call sites exists in any other part of the documentation. Note that it is also possible to declare multiple channels in the same way as you declare an array (channel_name[number_of_channels]) and in this case, all channel_name[i] (0 <= i <= number_of_channels - 1) will be a separate channel and call sites to all of these channels can coexist in the same kernel, as long as the channel ID is not repeated.
2. Blocking channel read/writes and ordering of channels are not tied to each other; unless you use fences or there is a dependency between the operation of your channels, the compiler can and will change the order of channels (in the same way as it would change the order of the rest of the operations in the kernel) to create the most efficient hardware.