User Profile

Mickleman

Occasional Contributor

Joined 4 years ago

15 Posts1 LikeLikes received2 Solutions

View All Badges

User Widgets

Contributions

Re: oneAPI FPGA Compile Node Errors
Hi @lkljucaric I had the same problem which was resolved as follows on advice from Intel. In the build script add a line at the beginning to ensure that the correct version of Python is used - here is the script I am now using: #!/bin/bash export PATH=/glob/intel-python/python2/bin/:${PATH} source /opt/intel/inteloneapi/setvars.sh > /dev/null 2>&1 make hw -f Makefile.fpga Hope this works for you as well. Kind regards Marcus
4 years ago Place Acceleration
2.3KViews
0likes
0Comments
Re: I cannot build and run my code for Intel PAC platform in Intel DevCloud.
I have the same problem. Have you found a solution yet Arthur? The solution suggested above by Janani only applies to the emulator. Marcus
4 years ago Place Acceleration
3.8KViews
0likes
0Comments
Build for Stratix on DevCloud has missing dependencies
Hi I have successfully built an application for Arria 10 which I want to try on Stratix 10. I have added the necessary compiler directive but it fails to build because of a missing dependency in the environment. Is there a workaround for this? Here is the full output of the build: dpcpp -O2 -g -std=c++17 -fintelfpga a1.o a2.o -o lk.fpga -Xshardware -Xsboard=intel_s10sx_pac:pac_s10 quartus_sh: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory Error: The patches required to compile for the target board (0.05dcp) is not installed for the following Quartus: /glob/development-tools/versions/oneapi/2021.3/inteloneapi/intelfpgadpcpp/2021.3.0/QuartusPrimePro/19.2/quartus/bin/quartus_sh dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation) make: *** [Makefile.fpga:26: lk.fpga] Error 1 Thanks Marcus
4 years ago Place Acceleration
1.7KViews
1like
5Comments
Re: Failure to fit an Arria 10 design through insufficent LABs when no problem in reports
Hi With insight from an Intel FPGA expert I have now solved this issue. The problem is that the reports are misreporting the percentage of MLABs that the design is using. The compiler error message is correctly indicating that I am asking for too many MLABs whereas the corresponding Area Analysis report is saying I am using only 65% of them. My solution is to check my MLAB use against the known MLAB capacity of the FPGA rather than trust the report. Marcus
4 years ago Place Acceleration
1.2KViews
0likes
0Comments
Re: FPGA compile blocks all other development
Thank you for your reply, Nurina My apologies - I omitted to mention this is about using oneAPI on the devcloud. Effectively I can only access one FPGA compile/compute node simultaneously whereas I used to be able to access 4. And 2 is a minimum for efficient development. Kind regards Marcus
4 years ago Place Acceleration
1.4KViews
0likes
0Comments
FPGA compile blocks all other development
Hi FPGA compiles typically take 6 to 24 hours. During that time I obviously wish to continue testing previous versions and developing new ones using the quick report level compiles. As of today neither of these is possible until the background compile has finished - I simply cannot access a second compile/run node. What do I need to do to be able to do this again? Regards, Marcus
4 years ago Place Acceleration
1.4KViews
0likes
4Comments
Failure to fit an Arria 10 design through insufficent LABs when no problem in reports
Hi I have a design for a kernel intended for Arria 10. I can compile and successfully run the design with both 1 and 2 copies of the kernel. There are still plenty of resources available so I am now trying to compile 3 copies of the kernel. The reports for this show predicted resource use as follows: ALUTs: 23% FFs: 10% RAMs: 48% MLABs: 5% DSPs: 5% But the compilation fails with the error: Error (170048): Selected device has 20774 RAM location(s) of type LAB. However, the current design needs more than 20774 to successfully fit. The current design uses 445783 RAM location(s) of type LAB. I cannot see where the 445,783 LABs comes from looking at the reports. And surely this number would have been around 300,000 in the compilation that was successful. Could someone point me in the right direction of where to find this excessive LAB use? Kind regards Marcus
Solved
4 years ago Place Acceleration
1.3KViews
0likes
2Comments
Re: Achieving parallel execution of loop on FPGA
Hi @HRZ Thank you for your time helping me to find a solution to this problem - it is very much appreciated. The reason for the II of 6 is not a mystery - it is as you say a (false) LD/ST memory dependency. If I compile the last example with just one of the loops the II is scheduled at 1 and the code gives the correct answer. So the compiler is able to respect the ivdep in the face of the LD/ST dependency. So why can't it do the same on the fused loop? On the subject of the i,j recovery you are right this could be done differently (as it is in the full design) but it is not relevant to the issue being explored here.
4 years ago Place Acceleration
2.5KViews
0likes
0Comments
Re: Achieving parallel execution of loop on FPGA
Thanks @MGRAV I'm not sure I quite understand what you are suggesting. If you could give me a version of the example changed as you propose I will try it out and feed back the results. Regards
4 years ago Place Acceleration
2.5KViews
0likes
1Comment
Re: Achieving parallel execution of loop on FPGA
Hi again MGRAV and HRZ I have further developed the example to avoid the false memory dependency and manually unrolled the loop. This allows the compiler to automatically fuse the 2 two loops thus achieving the desired concurrency. BUT even though both loops carry the ivdep the resulting fused loop nevertheless has an II of 6 (as if the ivdeps had been ignored). Here is the code: const int ITEM_LENGTH = 10000; const int GROUP_SIZE = 10; uint16_t a[GROUP_SIZE][ITEM_LENGTH]; uint16_t b[GROUP_SIZE][ITEM_LENGTH]; [[intel::ivdep]] for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++) { int i = k / GROUP_SIZE; int j = k - i * GROUP_SIZE; if ( i == 0 ) a[j][i] = j; else a[j][i] = a[j][i-1] + i; } [[intel::ivdep]] for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++) { int i = k / GROUP_SIZE; int j = k - i * GROUP_SIZE; if ( i == 0 ) b[j][i] = j; else b[j][i] = b[j][i-1] + i; } I'm at a loss. Why can't the fused loop respect the ivdep?
4 years ago Place Acceleration
2.5KViews
0likes
5Comments