Unexpected low Kernel Clock Frequency
I'm working on an OpenCL kernel targeting a Cyclone V SoC that should process a continuous real-time sample stream at a sample rate of 16 MHz, which requires a certain kernel clock frequency so that the kernel can keep up with the data stream. Coming from traditional VHDL design flows, I'm quite certain that a clock frequency of approx. 40 MHz should not be an issue for the Cyclone V.
However, the kernel is extremely slow. The Dynamic Profiler shows that the kernel clock runs at 1.3MHz. How can I investigate what slows down the Kernel clock to such a low frequency, what are best practices to increase the kernel clock frequency?
See the attached screenshot for details
Profiling Results:
The Qsys System:
The kernel code:
#pragma OPENCL EXTENSION cl_intel_channels: enable
struct TwoChannelSample
{
short2 chanA;
short2 chanB;
};
#define FIFO_DEPTH 32768
channel struct TwoChannelSample rxSamps __attribute__((depth(0))) __attribute__((io("THDB_ADA_rxSamples")));
channel struct TwoChannelSample txSamps __attribute__((depth(0))) __attribute__((io("THDB_ADA_txSamples")));
channel ushort stateChan __attribute__((depth(0))) __attribute__((io("THDB_ADA_state")));
kernel void thdbADARxTxCallback (global const float2* restrict txSamples,
global float2* restrict rxSamples,
global ushort* restrict interfaceState)
{
// get state from interface
*interfaceState = read_channel_intel (stateChan);
// Process sample-wise
for (int i = 0; i < FIFO_DEPTH; ++i)
{
struct TwoChannelSample rxSample = read_channel_intel (rxSamps);
rxSamples[i].x = (float)rxSample.chanA.x;
rxSamples[i].y = (float)rxSample.chanA.y;
rxSamples[i + FIFO_DEPTH].x = (float)rxSample.chanB.x;
rxSamples[i + FIFO_DEPTH].y = (float)rxSample.chanB.y;
}
}You seem to be using a custom-made BSP with multiple custom I/O channels; your critical path very likely lies in your BSP. You can try compiling an empty OpenCL kernel to see what operating frequency you will get. If what you get is still in the same range, then your critical path is in the BSP and you should optimize your BSP.