Ordering of channel operations
Hello,
I'm currently struggling with enforcing the order of write/read to channels.
I know, from the Intel programming guide, that two independent channels operation can be re-ordered by the compiler to generate efficient hardware.
However, this seems to occur even if there is a clear dependence. I've created a minimal working example for this:
#pragma OPENCL EXTENSION cl_intel_channels : enable
//message
typedef struct{
bool request;
int data;
}message_t;
//represents the status of the computation
typedef struct{
bool start;
message_t m;
}computation_t;
channel message_t channels[2] __attribute__((depth(2)));
// Auxiliary function for receiving data
void receive(computation_t *status, int *data){
if(status->start){
//at the beginning send the request for data
write_channel_intel(channels[0],status->m);
status_>start=false;
}
//receive the data and store it
status->m=read_channel_intel(channels[1]);
*data=status->m.data;
}
__kernel void comp(const int N, const int start, __global int *mem){
int data;
computation_t status;
status.start=true;
status.m.data=N;
for(int i=0;i<N;i++)
{
//receive data, increment and store it to memory
receive(&status,&data);
data++;
mem[i]=data;
}
}
//generates a stream of data upon request
__kernel void generator(){
//receive the request
message_t m=read_channel_intel(channels[0]);
for(int i=0;i<m.data;i++)
{
message_t send;
send.data=i;
send.request=false;
write_channel_intel(channels[1],send);
}
}The "comp" kernel is characterized from a pipelined loop in which it receives data coming from the "generator" kernel using the "receive" function. At the first iteration, a request is sent to the generator in order to let it generate the right amount of data.
If I try to compile this, the channel operations of the "receive" function are re-oderdered, as can be seen from the report:
This occurs even if there is a clear dependency between the two.
Clearly, if in hardware it is first executed the read, this will lead to deadlock.
This happens with Quartus 18.1 and 19.1 (Stratix 10 as target board).
In you opinion, is it a compiler bug or I have to handle this in a different way?
Thanks