--- Quote Start ---
Now, why did you say the FPGA-SDRAM port is faster than ACP port, I have read that by using the ACP port you can get data that is cached, so you don't have to read/write data to the DDR SDRAM in some cases.
--- Quote End ---
If you're using Xilinx Zynq, that might be true. But the Altera SoC ACP architecture ends up being slower in my experience than just using dma_sync functions with FPGA-SDRAM. This is an architecture flaw I believe that might be fixed in future updates to the SoC architecture.
--- Quote Start ---
If by using the FPGA-SDRAM ports I will get better performance, can you explain me how to use the dma_sync_single_for_device/cpu functions.
I am working of a project where I have a user application which will send some information to the FPGA module I have developed (some like a DMA module, but it doesn't move data in the way *dest++ = *src++), so how can I allocate physical memory that I can share between the FPGA and CPU? and what it the use of the dma_sync... functions?
--- Quote End ---
Check out
https://gnuradio.org/redmine/projects/gnuradio/wiki/zynq for some general resources for this idea. I used
https://github.com/jpendlum/user-peripheral-kmod as the basis for my kernel driver for a similar device. It allocates shared buffers between CPU and FPGA peripheral with no special cache management and also has extension for the Zynq ACP.
Basically I extended the user peripheral code above which bypasses cache when the mmap procedure identifies memory as pgprot_noncached:
static int user_peripheral_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct user_peripheral_drvdata *d = to_drvdata(filp->private_data);
if (vma->vm_pgoff == MMAP_REGS) {
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
if (remap_pfn_range(vma, vma->vm_start, d->regs_phys_addr >> PAGE_SHIFT,
d->regs_len, vma->vm_page_prot))
return -EIO;
...
When application code calls mmap(), the above driver call is used to link between application process memory space and kernel memory. And the driver has already allocated kernel memory that will be also read/written by the FPGA device.
For my cached buffers, I skip the call to pgprot_noncached so that memory is cached as needed and then create ioctl stub in the driver:
static long my_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg){
struct my_drvdata *d = to_drvdata(filp->private_data);
if(arg < 0 || arg >= MAX_BUFFERS){
dev_err(&d->pdev->dev, "Improper argument %lu.\n", arg);
return -1;
}
switch(cmd){
case MY_SYNC_DEVICE:
//device now owns buffer to read
dma_sync_single_for_device(&d->pdev->dev,
d->my_buf.buf_phys_addr, d->myfx_buf.buf_len,
DMA_TO_DEVICE);
break;
case MY_SYNC_CPU:
//cpu now owns buffer to read
dma_sync_single_for_cpu(&d->pdev->dev, d->my_buf.buf_phys_addr, d->my_buf.buf_len,
DMA_FROM_DEVICE);
break;
default:
dev_err(&d->pdev->dev, "Improper ioctl command %d.\n", cmd);
return -1;
}
return 0;
}
Now in my application code I can use ioctl to do cache flushing or invalidation.
// make sure cache is flushed, FPGA device reads from buf_no
ioctl(c->fd, MY_SYNC_DEVICE, buf_no);
//now FPGA can process buf_no and write back
// when FPGA finishes, invalidate the cache, CPU will now load memory into
// cache as needed.
ioctl(c->fd, MY_SYNC_CPU, buf_no);