--- Quote Start ---
I was surprised that it works as well, and only noticed it by accidentally leaving it in after debugging in the emulator.
--- Quote End ---
For anyone following this thread, I was able to find the solution to the deadlock. By using printf statements into a csv format, I found that the deadlock was occuring due to the channel reads and writes being executed in an unexpected order.
In my case I was using an NDRange kernel iterating over the channels with a loop as follows:
--- Quote Start ---
__kernel __attribute__((reqd_work_group_size(32,1,1))) ExampleKernel() {
int id = get_global_id(0);
char4 data[NUM_ELEMENTS];
// The data is processed here.
...
for(int i=0; i<NUM_ELEMENTS; i++) {
switch(id) {
case 0: write_channel_altera(outputChannel[0],data
);break;
case 1: write_channel_altera(outputchannel[1],data);break;
case 2: write_channel_altera(outputChannel[2],data
);break;
case 3: write_channel_altera(outputchannel[3],data);break;
case 4: write_channel_altera(outputChannel[4],data
);break;
case 5: write_channel_altera(outputchannel[5],data);break;
case 6: write_channel_altera(outputChannel[6],data
);break;
case 7: write_channel_altera(outputchannel[7],data);break;
case 8: write_channel_altera(outputChannel[8],data
);break;
case 9: write_channel_altera(outputchannel[9],data);break;
case 10: write_channel_altera(outputChannel[10],data
);break;
case 11: write_channel_altera(outputchannel[11],data);break;
case 12: write_channel_altera(outputChannel[12],data
);break;
case 13: write_channel_altera(outputchannel[13],data);break;
case 14: write_channel_altera(outputChannel[14],data
);break;
case 15: write_channel_altera(outputchannel[15],data);break;
case 16: write_channel_altera(outputChannel[16],data
);break;
case 17: write_channel_altera(outputchannel[17],data);break;
case 18: write_channel_altera(outputChannel[18],data
);break;
case 19: write_channel_altera(outputchannel[19],data);break;
case 20: write_channel_altera(outputChannel[20],data
);break;
case 21: write_channel_altera(outputchannel[21],data);break;
case 22: write_channel_altera(outputChannel[22],data
);break;
case 23: write_channel_altera(outputchannel[23],data);break;
case 24: write_channel_altera(outputChannel[24],data
);break;
case 25: write_channel_altera(outputchannel[25],data);break;
case 26: write_channel_altera(outputChannel[26],data
);break;
case 27: write_channel_altera(outputchannel[27],data);break;
case 28: write_channel_altera(outputChannel[28],data
);break;
case 29: write_channel_altera(outputchannel[29],data);break;
case 30: write_channel_altera(outputChannel[30],data
);break;
case 31: write_channel_altera(outputchannel[31],data);break;
}
}
--- Quote End ---
The above is an example of a (32,1,1) NDRange kernel. The data arrived such that the current loop iteration was completely finished before it continued. Here is how the data arrived:
id
loop iteration data
0
data[0]
1
data[0]
2
data[0]
...
...
31
data[0]
0
data[1]
1
data[1]
2
data[1]
...
...
...
...
0
data[NUM_ELEMENTS-1]
1
data[NUM_ELEMENTS-1]
2
data[NUM_ELEMENTS-1]
...
...
31
data[NUM_ELEMENTS-1]