Multiple read/write from channels
Hello,
I've two kernels: a producer (that generates a stream of float numbers) and a consumer (that read the numbers and compute the sum):
#define W 32
channel float channel_float __attribute__((depth(W)));
//generates a stream of floating point numbers
__kernel void producer(int N)
{
int outer_loop_limit=(int)(N/(W));
int remainder=N-(outer_loop_limit)*(W);
for(int i=0;i<outer_loop_limit;i++)
{
#pragma unroll
for(int j=0;j<W;j++)
write_channel_intel(channel_float,(float)(2.0));
}
//Remainder
for(int i=0;i<remainder;i++)
write_channel_intel(channel_float,(float)(2.0));
}
__kernel void consumer(int N, __global float* res)
{
int outer_loop_limit=(int)(N/(W));
int remainder=N-(outer_loop_limit)*(W);
float acc_o=0, acc_i=0;
float mult[W], x[W];
for(int i=0; i<outer_loop_limit; i++)
{
#pragma unroll
for(int j=0; j<W; j++)
x[j]=read_channel_intel(channel_float);
acc_i=0;
#pragma unroll
for(int j=0; j<W; j++)
acc_i+=x[j];
acc_o+=acc_i;
}
//Remainder
acc_i=0;
for(int i=0;i<remainder;i++)
{
x[i]=read_channel_intel(channel_float);
acc_i+=x[i];
}
acc_o+=acc_i;
*res=acc_o;
}To increase the number of writes/reads per clock cycle, I've applied unrolling (W times). So to handle a stream of N numbers we may have a little bit of extra code to handle the case in which N is not a multiple of W (in the code is called Remainders, lines 17-18 and 44-49).
Now, if I compile the code commenting the remainder part, everything goes ok: internal loops are fully unrolled and outer loops are pipelined with II=1.
Instead, If I have the remainder management, the compiler print a list of warnings like "Multiple writes to channel channel_float This may lead to bad QoR" and loops are pipelined with a II=34, claiming that there is a dependency in the writes into the channels.
Any hints on how to solve the problem?
Quartus version is 18.0, compiled for Arria10.