Board Debug

12 Topics

The compilation time is too long for Intel FPGA OpenCL
I am trying to compile a HLS project with Intel SDK for OpenCL 20.3 on DE10 PRO. This project used to take 5~6 hours to compile on Intel SDK for OpenCL 19.4 on Arria10, but it has taken more than 17 hours of compilation now. The resource consumption in Linux is shown below. It seems that the resource used is not that much. The latest file generated in compilation is shown below. It seems that routing has been finished successfully. But after top.fit.route.rpt generated, 5 hours has been consumed with no refresh in any file. I want to know if it is usual to take such a long time in compilation and how can I reduce the compilation time in this flow.
Solved
myg
9 months ago Place Acceleration
2.4KViews
0likes
4Comments
I want to know how to control hyper-optimized handshaking setting
I just start using quartus pro 20.3 to compile on DE10-PRO. I used prefetch_load in my code and get a error as follow. Compiler Error: Prefetching LSU is not available when hyper-optimized handshaking is enabled Then I deleted prefetch_load and turn back into normal global memory access, this error disappeared. But I found than in my compile report, the hyper-optimized handshaking in Kernel Summary is off. Why I can't apply this feature after cancel prefetch_load lsu? And how can it affect my design.
Solved
myg
9 months ago Place Acceleration
1.2KViews
0likes
5Comments
INTELIGENT DEBUG AND AND CERTIFICATION TOOL Gen2
Part Number Q6UJIDVDBG2 99A8CP - Lot # IDVDBG2-21380845 ==> we purchased this tool for an European Customer, we need F-Gas Certs in order to clear customs. I cannot find the information anywhere. Thank you!
Arlete
10 months ago Place Acceleration
467Views
0likes
0Comments
Intel OpenCL compiler (aoc) does not coalesce global memory reads anymore
The two screenshots says it all. The old screenshot is generated with aoc 21.2.0. Note how it coalesces the 16 float reads into one 512 bit DDR read. The new screenshot is generated with aoc 2024.2.1. It does not coalesce the 16 float reads and instead creates 16 individual read ports. Afaict, that is quite bad for performance and it wastes a lot of hardware resources. Is there a way to make aoc 2024.2.1 coalesce, exactly like the old compiler did?
Björne2
11 months ago Place Acceleration
2KViews
0likes
8Comments
Choosing FPGA board for ML implementation using oneAPI
Hello, I wish to implement transformer modules on an Intel FPGA using oneAPI. HBM is preferred and the compatability with oneAPI workflow is important. Some options I checked were: - Stratix 10 NX - Stratix 10 MX I did not find much about using the AI tensor blocks with oneAPI so wanted to check if there are no restrictions on that perspective. Other suggestions for the FPGA boards will be great too. Thanks
JSYOO
2 years ago Place Acceleration
1.6KViews
0likes
6Comments
Valid substitute to EN5396QI-T
Goodmorning everyone, I am trying to find valid alternative to EN5396QI-T, EN5366QI and EN5337QI-T, which are now obsolete. I must ensure both the same output current, switching frequency and low input voltage. The substitute must have the integrated inductor too. Do you know any valid alternatives? At the moment, the best I found are the MPS DC-DC converters, but unfortunately they do not reach 5MHz of switching frequency. Thank you all for the attention, Best regards, Andrea B.
Electronica_srl
2 years ago Place Acceleration
1.2KViews
2likes
4Comments
Why does aoc set ii to 6 when I use high clock frequencies?
I have a simple toy that I want to run at 1000 MHz kernel that doesn't do much: __attribute__((uses_global_work_offset(0))) __attribute__((max_global_work_dim(0))) __kernel void netsim( __global const volatile float * restrict gl_vm ) { float vm[50000]; #pragma ii 1 #pragma ivdep #pragma speculated_iterations 64 for (int i = 0; i < 50000; i++) { vm[i] = gl_vm[i]; } } According to the report (see screenshot), II=6 and latency=927. Why can't the compiler lower the latency and set II to 1 here?
Björne
2 years ago Place Acceleration
4.9KViews
0likes
10Comments
Xilinx/AMD FPGA
Hello I want to use Openvino for inference of a Deep Learning Model over FPGA board which is not from Intel. It's Zedboard from AMD/Xilinx. So my question is it supported by Openvino? Thanks
Khan55
2 years ago Place Acceleration
968Views
0likes
1Comment
What causes OpenCL to insert arbitration for local memory accesses?
I know FPGA OpenCL is deprecated in favor of OneAPI, but I hope you can help me anyway. I've created a MWE of a kernel for which the compiler inserts arbitration: __attribute__((uses_global_work_offset(0))) __attribute__((max_global_work_dim(0))) __kernel void kmain(uint n_tics, __global const volatile uint * restrict dsts) { float frontier[100]; #pragma disable_loop_pipelining for (uint i = 0; i < 100; i++) { frontier[i] = 0; } uint nqueue[100]; uint nqueue_n = 20; for (uint t = 0; t < n_tics; t++) { for (uint i = 0; i < 100; i++) { float tmp = frontier[i]; frontier[i] = 0; } for (uint j = 0; j < nqueue_n; j++) { uint src=nqueue[j]; frontier[dsts[src]] += 50; } } } So first I reset all elements of frontier. Then the simulation loop starts and I read one element from frontier and clear it. Then I add 50 to the values at the indexes given by another variable. I know the kernel reads from uninitialized memory, but it's beside the point (I think). In the report aoc complains about "Potentially inefficient configuration" and I can see that it has inserted arbitration circuits (see screenshot). So the question is why? And how can I fix this memory access pattern to be arbitration-free?
Solved
Björne2
2 years ago Place Acceleration
2.1KViews
0likes
4Comments
Logic elements utilization in FPGA
Hello All, I have an old design on cyclone 3 with Quartus 10 and the used logic elements (in ALMs) are 20k. Now, I migrate exactly the same design to Quartus 21 also I changed the FPGA to Cyclone V and now the used logic elements (in ALMs) are 4k. So the only changes are the FPGA from Cyclone 3 to V and the Quartus version from 10 to 21. Why the used logic elements (in ALMs) reduced from 20k to 4k? What could wrong happend? w PS. no optimization is done for both Quartus project.
Solved
t_alars
3 years ago Place Acceleration
3KViews
0likes
9Comments