Forum Discussion
Altera_Forum
Honored Contributor
10 years agoI've researched the issue some more and distilled it down to simpler code:
__kernel void in_streamer_a(__global const uint2* in) {
printf("in streamer a start\n");
for(uint i = 0; i < SIZE; ++i) {
write_channel_altera(in_channel, in);
}
printf("in streamer end start\n");
}
__kernel void in_streamer_b(__global const uint2* in) {
printf("in streamer b start\n");
for(uint i = 0; i < SIZE; ++i) {
write_channel_altera(in_channel, in);
}
printf("in streamer b end\n");
}
__kernel void out_streamer(__global uint2* out) {
printf("out streamer start\n");
for(uint i = 0; i < SIZE; ++i) {
ushort4 value = read_channel_altera(out_channel);
out = value;
}
printf("out streamer end\n");
}
__kernel void worker()
{
printf("worker start\n");
for(uint i = 0; i < SIZE; ++i) {
write_channel_altera(out_channel, read_channel_altera(in_channel) + read_channel_altera(in_channel));
}
printf("worker stop\n");
}
When I run all kernels in the emulator with a small SIZE parameter, i get the correct result and the following console output
in streamer a start
in streamer a stop
in streamer b start
in streamer b stop
worker start
worker stop
out streamer start
out streamer stop
The kernels are run sequentially and the intermediate results are buffered in a channel buffer as it seems. This fits well with the official altera documents. However, when I use a large SIZE argument (10240 or more to be precise), I get the following output and a deadlock:
in streamer a start
Only the first kernel is ever run. It just hangs there. It could be that some internal buffer for the used channel is full, but I don't know of any way tho change its size. Maybe it's something else. Using the following channel attribute does not help:
__attribute__((depth(SUFFICIENTLY_HUGE_NUMBER)));
Note: everything works perfectly fine on the device, where the kernels are actually run in parallel.