ContributionsMost RecentMost LikesSolutionsRe: How to run multiple work groups in parallel using ND Range Work-items in the same work-group won't run in parallel, they will be pipelined. You will need to use the SIMD attribute to achieve work-item-level parallelism: https://www.intel.com/content/www/us/en/docs/programmable/683846/22-4/specifying-number-of-simd-work-items.html Multiple work-groups are also automatically pipelined one after the other inside the same compute unit, and the compiler will replicate local memory buffers inside your kernel to accommodate for this. If you want to have work-group-level parallelism, then you need to use the num_compute_units() attribute: https://www.intel.com/content/www/us/en/docs/programmable/683846/22-4/specifying-number-of-compute-units.html Re: [FPGA SDK for OpenCL] Problem with setting multiple compute units That is exactly what the num_compute_units(X) attribute does. It will automatically replicate your compute unit "X" times, and distribute work-groups across them. At the same time, even one compute unit can pipeline work-items across different work-groups, so there could be more than one work-group running inside of a single compute unit at each given time. Hence, Intel recommends using 3 times more work-groups than compute units to maximize compute unit utilization. Usage of this attribute is explained here: https://www.intel.com/content/www/us/en/docs/programmable/683846/22-1/specifying-number-of-compute-units.html Note that this attribute will NOT change CL_DEVICE_MAX_COMPUTE_UNITS, since there is still one physical FPGA on the board. Re: FPGA Build fail on Devcloud That machine only has 16 GB of memory, that is not even remotely enough to synthesize a design for Stratix 10. Doesn't DevCloud have specific nodes for synthesis? Re: [FPGA SDK for OpenCL] Problem with setting multiple compute units What exactly are you trying to achieve by that? An FPGA design is not fixed and the underlying FPGA architecture does not have any notion of a "compute unit"; "compute unit" is simply an OpenCL terminology which doesn't necessarily map to anything meaningful on an FPGA. You can always compile and synthesize multiple kernels into one bitstream and run them in parallel in different queues, if that is what you are trying to achieve. There are also ways to automatically create/duplicate compute units in both Single Work-item and NDRange kernels. Re: FPGA Build fail on Devcloud It seems like the process is getting killed, likely due to lack of enough memory. FPGA synthesis requires a very large amount of memory. Do you have the amount of memory recommended below on the machine you are performing the synthesis on? https://www.intel.com/content/www/us/en/docs/programmable/683706/21-4/memory-recommendations.html Re: DPC++ project built for FPGA emulation takes very long time to implement on CPU Since you are building in emulation mode, it is expected that the code would take a very long time to execute since it is emulating the FPGA hardware in software. If you compile for and run your code on an actual FPGA, then it will be much faster. Emulation mode is just to ensure code correctness; the time it takes for the application to execute in emulation mode does NOT represent the time it would take for the code to run on an actual FPGA. Re: FPGA DDR memory access before OpenCL aocx loading The host-side APIs are standard OpenCL APIs; namely, you can create buffers in the FPGA DDR memory using the clCreateBuffer() API, copy data from host DDR to FPGA DDR using clEnqueueWriteBuffer(), and copy data from FPGA DDR back to host DDR using clEnqueueReadBuffer(). I recommend studying existing OpenCL books/guides since these APIs are common to all OpenCL-capable devices and are not limited to Intel FPGAs. To perform data transfer from host to device or vice versa, you need to create a "context", then a "queue", then create host buffers using standard C/C++ APIs (malloc or new), then create device buffers using clCreateBuffer(), and then use clEnqueueWriteBuffer() and clEnqueueReadBuffer() to transfer data between host buffers residing in host DDR and device buffers residing in FPGA DDR. None of these APIs depend on a valid "program" or "kernel" and as such, they can all be called without an aocx file. Slides 54 and onward in the following documentation might be helpful in understanding the flow: https://extremecomputingtraining.anl.gov/files/2018/08/ATPESC_2018_Track-1_8_7-30_315pm_Moawad-FPGA.pdf Intel's FPGA SDK for OpenCL guides can also be found below: https://www.intel.com/content/www/us/en/docs/programmable/683188/21-4/pro-edition-getting-started-guide.html https://www.intel.com/content/www/us/en/docs/programmable/683846/21-4/overview.html https://www.intel.com/content/www/us/en/docs/programmable/683521/21-4/introduction-to-pro-edition-best-practices.html Re: FPGA DDR memory access before OpenCL aocx loading 1. No. Without any firmware/binary loaded onto the FPGA, the PCI-E core will not work and you will not be able to access the FPGA DDR memory. In fact, the FPAG board will not be detected at all in such case. As part of the board setup process, you need to flash the FPGA with a base firmware through JTAG to enable the PCI-E and DDR cores and OpenCL interface. After that, you can access the FPGA DDR memory with or without an .aocx file. 2. If the board has been set up correctly and 'aocl diagnose' passes, yes. All data transfers between host and device are done using host-side API calls. I am not sure what you mean by device to device data transfer, though. If you are trying to move data from one DDR bank on the FPGA board to another DDR bank on the same board, you either have to write a kernel to copy the data through the FPGA (which will require an .aocx file), or you will have to copy the data back to the host and then write it to the other bank. 3. Yes. All data transfer between host and device are initiated from the host side and do not require an active kernel. 4. The question is not clear; what "protocol" are you referring to? Re: Unable to detect device using 'aocl diagnose' command What is the exact model of the board you are using? If you are using a supported board which comes with an OpenCL BSP, you should be able to set it up yourself by following the documentation. Re: Unable to detect device using 'aocl diagnose' command Have you installed the SDK (which also installs the necessary drivers) and programmed the FPGA with a valid firmware to enable the PCI-E core? The following document covers all necessary steps: https://cdrdv2.intel.com/v1/dl/getContent/709277?fileName=aocl_getting_started-683188-709277.pdf