Hello, I'm currently struggling with enforcing the order of write/read to channels. I know, from the Intel programming guide, that two independent channels operation can be re-ordered by the compil...
I see the II increase, it jumps to 7 by inserting a fence exactly like you described (it jumps to 20 if the status variable is stored in memory). This is why the diagram was red in the picture attached above.
Also, if I compile with attibute memory, I obtain a different diagram wrt to yours (I don't have 4 stores, just one).
Still, I don't understand the argument about why channels are considered to be independent:
write channel take in input a variable V;
read channel writes into a variable Z
If Z==V, the compiler must enforce that the two operations are not swapped.
If this is not the case, please explain it clearly into your documentation.
Your reply with the tmp example clarified what is happening (thanks).
Still, I'm not understanding why is this happening: are there any explanations about the compiler not taking care of that dependencies?
Furthermore, having a cyclic dependencies (that the barrier does not resolve) is a thing that could happen frequently and ordering is what you rely on to build client-server patterns for example.
Concerning the higher II, I have the same II (7) even with the 19.1 compiler. I've attached the reports (the same occurs if I use the s10gx BSP included in the reports). Do you have any idea about why is this happening?
You are compiling against Stratix 10 while @douglas.prinn is probably compiling against Arria 10. On Arria 10, II is one after adding the mem_fence while on Stratix 10 it becomes 7. This is likely because the default target operating frequency on Arria 10 is 240 MHz, while on Stratix 10 it has been increased to 480 MHz. The solution is to probably not use channels at all (or multi-kernel designs, or the autorun kernel feature, or any other useful feature of the compiler that worked perfectly fine on previous generation FPGAs but for some unexplained reason, has "high overhead" on Stratix 10) as written in Section 9 of the Best Practices Guide. You can also try reducing the Fmax targert using the --fmax switch to reduce the II, but that will likely also lower your final post-place-and-route operating frequency.
I agree with you: iIf I compile against Arria 10 I also obtain an II=1.
For the Stratix 10:
the fmax switch does not help;
if I use non-blocking channels (like suggest in the Intel documentation), I get a Serial Exe on the for loop in the comp kernel (i.e. it is not pipelined)
Therefore, @douglas.prinn are there any ways of obtaining an II=1 for a Stratix 10?
You can get ii = 1 with Stratix 10 by removing the data dependency on the variable status in the main loop. There are several ways to remove the dependency.
One way is by moving the status variable inside the loop. See below.
__kernel void comp(const int N, const int start, __global int *mem){
Right. I see now. You are writing the previous value of status that you read from channel 1 into channel 0, so you can't move status inside the loop. The problem is that if the value of status is carried across loop iterations, iteration i+1 can't start until the value of status has been updated by iteration i. I’m not sure how to remove the dependency. I will have to think about it a little more.