User Profile

Dr_FPGA

New Contributor

Joined 6 years ago

12 Posts1 LikeLikes received

View All Badges

User Widgets

Contributions

Re: X2go timing out local host connection
Hi Lawrence, I missed the second terminal ssh command. After I followed exact instructions in section 6.0 https://github.com/intel/FPGA-Devcloud/tree/master/main/Devcloud_Access_Instructions I was able to connect with X2Go in GUI mode. Thank you.
5 years ago Place Acceleration
9.2KViews
0likes
1Comment
X2go timing out local host connection
Hi, I am following similar instructions as in this thread: https://community.intel.com/t5/Intel-High-Level-Design/X2GO-client-connection-failure-Socket-Disconnected/m-p/656069#M209 However in my case X2go connection to localhost times out. Please see the attached picture. I am able to connect to devcloud with PuTTY in terminal mode. AWS and NIMBIX GUI work fine in my environment. I already dropped all firewalls: soft and on the router, still the same result. Please advise what the next step. Thank you.
Solved
5 years ago Place Acceleration
9.2KViews
0likes
3Comments
Re: OpenCL private_copies attribute does not seem to work in 20.1
Hi Anil, Thank you for pointing out the scope of this new attribute. However, our use case is a __local buffer copied into FPGA accelerator block memory in one loop to avoid paying the latency penalty for each of the accesses/copies from global memory. The __local buffer values are used in another loop for computation. This is a very typical use case scenario for FPGAs. It appears that AOCL offline compiler builds a pipeline with a memory replication factor based on some default parameters assuming multiple workgroups accessing local buffer(s) without knowing apriori how many workgroups will be launched from global_size/local_zise ratio. The user does not have full control over the generated memory replication factor in this very typical use case. This results in wasted local memory resources by several factors. I would suggest extending the scope of the private_copies attribute to the whole kernel on a per __local buffer basis in the next release of the OpenCL compiler to give users full control over __local buffers replication factor. The current remedy is to increase the reqd_work_group_size or max_work_goup_size which would indirectly change default AOCL compiler computation in favor of a smaller memory replication factors.
5 years ago Place Acceleration
2.1KViews
0likes
0Comments
Re: Cannot connect to arria10 node
I had the same issue using these instructions. I suspect these are outdated because the node numbers are different in this script and what shows on DevCloud: pbsnodes | grep -B 4 fpga I was able to connect to an FPGA node to compile with the following command: qsub -I -l nodes=1:fpga_compile:ppn=2 -d . I think to run it you have to have another node with actual FPGA: qsub -I -l nodes=1:fpga_runtime:ppn=2 -d .
5 years ago Place Acceleration
1.5KViews
0likes
0Comments
Re: I cannot build and run my code for Intel PAC platform in Intel DevCloud.
Hi Janani, Now I am really confused. I thought this is THE FPGA FORUM for High Level Design. Please stop this practice of bouncing forum threads. Other users are interested in the solutions to the questions, Just my 2c worth.
5 years ago Place Acceleration
4.5KViews
1like
0Comments
Re: OpenCL private_copies attribute does not seem to work in 20.1
Hi HRZ, As I mentioned, the report shows before and after 8 copies. The same factor is for the loop of 4 where there is no need to replicate beyond 4. I suspect I have something wrong with the syntax of this attribute. Please note this is new attribute in 20.1 and a similar attribute exists in oneAPI, so you may have not seen it or tried it yet.
5 years ago Place Acceleration
2.1KViews
0likes
2Comments
Re: Host channels arria 10 pac [DevCloud]
Why just not to add Software support for host pipes to Intel PAC card? This should be less than a month, it is just a software update! Our application is confidential at the moment. However, the convenience of pumping data via host channels has advantage of having much less code than handling ping-pong buffers with events, etc. And we all for less code and conveniences, correct?
5 years ago Place Acceleration
3.1KViews
0likes
0Comments
Re: Host channels arria 10 pac [DevCloud]
Same here. I am very interested getting OpenCL host pipes working with our application. BTW I tried to get access to "right Arria 10 cloud" and got yet another DevCloud account w/o GUI tools and Tcl scripts.
5 years ago Place Acceleration
3.1KViews
0likes
0Comments
OpenCL private_copies attribute does not seem to work in 20.1
Hello OpenCL FPGA developers, I have an OpenCL NDRange (64,1,1) kernel with 8 times replicated multiple local memories which make this kernel memory size limited (>100% M20s on A10). I have attempted to limit the replication factor by applying the newly introduced attribute in UG-OCL002 | 2020.04.13 20.1 aocl_programming_guide.pdf page 41. Example for one of the buffers: __local float __attribute__((private_copies(4))) x[M][N]; However, this attribute does not seem to have intended effect and I am stuck with 8 times replicate private copies. I know that reducing replication by a factor of 2 will make my kernel slower, but I could use less memory tradeoff for a bit slower kernel. Moreover, the speed decrease when all these buffers are used is a small percentage of the overall kernel schedule. Thank you for your input.
5 years ago Place Acceleration
2.1KViews
0likes
7Comments
Re: Reproducing the same results in Quartus
Intel OpenCL compilation flow includes several attempts to improve kernel clock frequency in post_flow.tcl calling the PLL frequency adjustment script: source "$sdk_root/ip/board/bsp/adjust_plls.tcl". Running this script may result in different PLL settings depending on primarily utilizaiton of FPGA and speedgrade. If nothing has changed in your compiles including your CPU, this is indeed strange result. But in general, you may have different number of workgroups or compute units in FPGA. As the utilization of FPGA grows the maximum frequency tends to go down which may explain the different multiply and divide ratio. If your applicaition requires a fixed repeatable frequency it is quite easy to change the script to your local copy which does only one attempt. BTW this will save you some compile time too.
6 years ago Place Acceleration
2.1KViews
0likes
1Comment