User Profile

Mopplikus

New Contributor

Joined 2 years ago

6 Posts

View All Badges

User Widgets

Contributions

Re: Intel HLS - simulating the design with ModelSim
Hello, I found the solution to my issue. For instance, I needed to add the -ghdl flag to the i++ command, which then generates a .wlf file in the verification folder, which I can then open with ModelSim. For future reference, to get the wave graph, you have to choose the instance of your top level component, and then in the Objects tab you can add the waves for the signals you need. I'm encountering another issue now however; for instance, no matter the target frequency I specify, in the wave graph it is always set to 1000MHz, even if I override it with the --clock flag when compiling or for the individual components with hls_scheduler_target_fmax_mhz(). How could I fix this? -Mopplikus
2 years ago Place Acceleration
1.5KViews
0likes
0Comments
Intel HLS - simulating the design with ModelSim
Hello, I have generated a design using Intel HLS, and now I'd like to look at the corresponding wave graphs in ModelSim / QuestaSim. Since for the co-simulation a test-bench has already been generated, I normally shouldn't need to create a new one, so how do I go about 'hijacking' the existing test-bench to get to the wave graph? By the way, I'm using version 21.4.0 of the HLS. Thanks in advance! -Mopplikus
2 years ago Place Acceleration
High-level Design Tools
1.6KViews
0likes
5Comments
Intel HLS - dynamic scheduling
Hello, I've been messing with Intel HLS, but I'm struggling to see the performance benefits of dynamic scheduling. I've composed a test component which is meant to vary in performance according to the inputs I supply to it: using namespace ihc; typedef ihc::mm_host<int, ihc::dwidth<256>, ihc::awidth<32>, ihc::aspace<1>, ihc::latency<1> > mem_1; typedef ihc::mm_host<int, ihc::dwidth<256>, ihc::awidth<32>, ihc::aspace<4>, ihc::latency<1> > mem_2; component int if_loop_3(mem_1 &a, mem_2 &b, int n) { int i; int dist; int sum = 1000; for (i=0; i<n; i++) { dist = a[i] - b[i]; if (dist >= 0) { sum = (sum /dist); } } return sum; } In this instance, I am comparing best-case against worst-case, meaning the case where the if is never true against the one where it is always true, with an integer division as a long-latency event to make the difference clear. However, when running the co-simulation with Intel HLS, in the report I always get the same latency, no matter if the test-bench supplies the best or worst case inputs to the component, suggesting that there is no dynamic scheduling taking place. Is this correct? Do I need to specify to the compiler that it needs to use dynamic scheduling to optimize the runtime? Thanks in advance. Also feel free to give me any suggestions concerning improving my code. -Mopplikus
2 years ago Place Acceleration
1.4KViews
0likes
3Comments
Re: Intel HLS - high load / store latency
Hello, Yes, you may close this thread. K.R.
2 years ago Place Acceleration
1.9KViews
0likes
0Comments
Re: Intel HLS - high load / store latency
Update: I found a solution to my problem. It turns out that I need to initialize the memory on-board inside a component, since initializing it in the main function probably assumes that the loads and stores go through the board's I/O instead of the embedded memory. Moving the initialization inside the component solved the issue and brought it to an expected 1 cycle latency.
2 years ago Place Acceleration
2KViews
0likes
0Comments
Intel HLS - high load / store latency
Hello, I am in the process of benchmarking a few common HLS tools, and I'm having some issues with Intel HLS. I've implemented a simple histogram to test the tool, however in the test-fpga report I'm getting unusually high latency from load / store operations, raising my II far above normal levels. A load / store operation according to the report takes 31 cycles, leading me to believe that the way I wrote the histogram, the tool does not use the embedded memory on the board (which should have a 1-cycle load / store latency, knowing that I expect this circuit to run in the 200-300 MHz range). What do I need to specify to change this to use the on-board memory, or to reduce the load / store latency? Below you can find the C++ code I'm synthesizing. Note that the goal is to pre-initialize the RAM with the inputs to the histogram function, and the function should then iterate over them. #include <HLS/hls.h> #include <stdio.h> #include <iostream> #define N 100 using namespace ihc; component void histogram( int feature[], float weight[], float hist[], int n ) { int i; for(i = 0; i < n; i++) { int m = feature[i]; float wt = weight[i]; float x = hist[m]; hist[m] = x + wt; } } int main() { hls_memory hls_singlepump int feature[N]; hls_memory hls_singlepump float weight[N]; hls_memory hls_singlepump float hist[N]; int i; for(i = 0; i < N; i++) { feature[i] = i + 1; weight[i] = (float) (2 * i); hist[i] = 0.0f; } histogram(feature, weight, hist, N - 1); bool failed = false; for(i = 0; i < N; i++) { float val = hist[i]; if(i == 0) { if(val != 0.0) { failed = true; break; } } else { if(val != (float) ((i - 1) * 2)) { failed = true; break; } } } if(failed) { printf("FAILED"); } else { printf("PASSED"); } return 0; } For reference, I'm synthesizing on the default Arria 10 board. Also, if you have any tips on improving my code or some standard practices which I'm unaware of, I'll gladly take them. Thanks in advance.
2 years ago Place Acceleration
2KViews
0likes
4Comments