--- Quote Start ---
Hello everyone,
Under the section "Use Models of AOCL Channels Implementation" of the AOCL programming guide, it says to transfer large data messages between kernels, buffer management could be used. The implementation example given used "__global volatile"as memory type. I am wondering what kind of hardware does this kind of memory maps to? Is it implemented using DDR, same as the normal global memory? Or is it implemented using FIFO, same as the channel?
Also, what is the benefit for using buffer management, comparing to global memory or multiple channels? Does it offers higher throughput or less latency?
Thanks!
--- Quote End ---
In this case, I believe data is written to global memory. Essentially the producer stores the data in global memory then sends a token to the consumer that indicates where the data was written in global memory and to go get the data. In terms of benefits, it has the benefit of concurrency. If you're just using global memory, then the host has to handle the movement of data from one kernel to the other which can be slow since the host will have to enqueue read the result from global memory then enqueue it for the consumer kernel. In terms of channels, if there is a large amount of data, instead of creating a really large buffer, writing it to essentially a shared memory region is beneficial because now the producer doesn't have to wait on the consumer if the fifo is full.
--- Quote Start ---
BTW: Why is the 3rd management kernel necessary at all? Is it possible to implement cyclic channel access between two kernels without use of this buffer management? Or is there is any negative implication for doing so? For example, why can't I just implementation something like this?
#pragma OPENCL EXTENSION cl_altera_channels : enable
channel float16 feed_forward;
channel float16 feed_back;
__kernel void producer ( ... ){
... Transfer data from global to local memory ...
for (int i=0; i<iteration; i++){
... Computations ...
write_channel_altera(feed_forward, data_in_local_memory);
data_in_local_memory = read_channel_altera(feed_back);
}
}
__kernel void consumer ( ... ){
for (int i=0; i<iteration; i++){
data_in_local_memory = read_channel_altera(feed_forward);
... Computations ...
write_channel_altera(feed_back, data_in_local_memory);
}
... Transfer data from local back to global memory ...
}
--- Quote End ---
So with the above approach, if the write and read to the channels are not in sync then contention could occur. Meaning if the consumer has a expensive computation, then the producer stalls because it's waiting on the consumer to read from the channel with the data. With the buffered management, data is written to shared memory which essentially is in the range of GB. Even with buffered channels, you can't hold that much data. This becomes fairly important if you're doing real-time processing. If the producer stalls, then you'll be dropping information waiting for the consumer to open up.
Hope this helps.