Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- Well, this trade-off always exists that if your host to device transfer takes longer than your compute, you just compute on the host... There is a note in Altera's documents that your OpenCL buffers must be 64-bit aligned to get full performance of DMA through PCI-E for host to device transfers, but I don't think this applies to the Cyclone SoCs since there is no PCI-E. On the other hand, I was under the impression that you can have shared memory between the ARM and the FPGA on these SoCs so that everything you malloc on the ARM is directly accessible to the FPGA. You can try passing the host pointer to the FPGA (CL_MEM_USE_HOST_PTR), instead of copying the data, and see what happens. --- Quote End --- I found folowing phrase in the Intel® FPGA SDK for OpenCL™ Programming Guide: you cannot use the library function malloc or the operator new to allocatephysically shared memory. also, the cl_mem_use_host_ptr flag does not workwith shared memory. So I think it´s not possible to use this with the Cyclone V SoC because it has the shared memory by default? Also i have a qequetion about the part in the Intel® FPGA SDK for OpenCL™ Programming Guide on S. 95 which explains the use of CL_MEM_ALLOC_HOST_PTR. With this flag I only allocate the memory on the shared memory between CPU and FPGA but since i´m using the Cyclone V Soc it´s declared by default (using clCreateBuffer) on the shared memory? So how does this improve the data-transfer between host and device? Isn´t there a way the FPGA can use the data from the shrared memory directly? Maybe someone can say what this part of the guide means. Is this the way to go to transfer the data efficient? But i understand the sense of this.
To transfer data from shared hard processor system (HPS) DDR to FPGA DDR efficiently, include a kernel that performs the memcpy function, as shown below.
__attribute__((num_simd_work_items(8)))
mem_stream(__global uint * src, __global uint * dst)
{ size_t gid = get_global_id(0); dst = src; }