Intel HLS - high load / store latency
Hello,
I am in the process of benchmarking a few common HLS tools, and I'm having some issues with Intel HLS. I've implemented a simple histogram to test the tool, however in the test-fpga report I'm getting unusually high latency from load / store operations, raising my II far above normal levels.
A load / store operation according to the report takes 31 cycles, leading me to believe that the way I wrote the histogram, the tool does not use the embedded memory on the board (which should have a 1-cycle load / store latency, knowing that I expect this circuit to run in the 200-300 MHz range). What do I need to specify to change this to use the on-board memory, or to reduce the load / store latency?
Below you can find the C++ code I'm synthesizing. Note that the goal is to pre-initialize the RAM with the inputs to the histogram function, and the function should then iterate over them.
#include <HLS/hls.h> #include <stdio.h> #include <iostream> #define N 100 using namespace ihc; component void histogram( int feature[], float weight[], float hist[], int n ) { int i; for(i = 0; i < n; i++) { int m = feature[i]; float wt = weight[i]; float x = hist[m]; hist[m] = x + wt; } } int main() { hls_memory hls_singlepump int feature[N]; hls_memory hls_singlepump float weight[N]; hls_memory hls_singlepump float hist[N]; int i; for(i = 0; i < N; i++) { feature[i] = i + 1; weight[i] = (float) (2 * i); hist[i] = 0.0f; } histogram(feature, weight, hist, N - 1); bool failed = false; for(i = 0; i < N; i++) { float val = hist[i]; if(i == 0) { if(val != 0.0) { failed = true; break; } } else { if(val != (float) ((i - 1) * 2)) { failed = true; break; } } } if(failed) { printf("FAILED"); } else { printf("PASSED"); } return 0; }
For reference, I'm synthesizing on the default Arria 10 board.
Also, if you have any tips on improving my code or some standard practices which I'm unaware of, I'll gladly take them.
Thanks in advance.