Hi, Is there any trick to using the ACP port from FPGA under Linux (3.9)? I'm setting the f2h AR & AW CACHE/PROT/USER by hand, post-qsys, but don't see any indication that the SCU cares at all about the address I send on f2h_axi_slave and reads go to SDRAM instead of L1 or L2 cache. Thanks

The answer turned out to be surprisingly simple. Make sure the physical address passed by the driver to the FPGA accelerator is "| 0x80000000". This puts the address in the ACP range. Otherwise, the address bypasses the SCU. This also allows one to test performance with and without the ACP by simply changing the address at runtime. Thanks to LegUp for the solution: http://janders.eecg.toronto.edu/pdfs/euc14.pdf Note: cache attributes set by hand in the top level HDL after Qsys compile are: .f2h_ARCACHE (4'hf) .f2h_ARPROT (3'h0) .f2h_ARUSER (5'h1f) .f2h_AWCACHE (4'hf) .f2h_AWPROT (3'h0) .f2h_AWUSER (5'h1f) ACDS is 14.0.

--- Quote Start --- The answer turned out to be surprisingly simple. Make sure the physical address passed by the driver to the FPGA accelerator is "| 0x80000000". This puts the address in the ACP range. Otherwise, the address bypasses the SCU. This also allows one to test performance with and without the ACP by simply changing the address at runtime. Thanks to LegUp for the solution: http://janders.eecg.toronto.edu/pdfs/euc14.pdf Note: cache attributes set by hand in the top level HDL after Qsys compile are: .f2h_ARCACHE (4'hf) .f2h_ARPROT (3'h0) .f2h_ARUSER (5'h1f) .f2h_AWCACHE (4'hf) .f2h_AWPROT (3'h0) .f2h_AWUSER (5'h1f) ACDS is 14.0. --- Quote End --- Hi Sir, I am trying to interface a program executed from linux to a FPGA master, I need to use the ACP port, I changed the cache attributes as you said. But What I am not sure is what physical address to map in map linux program or what address send to the FPGA master I have developed. I will appreciate any help you can give me! Thanks.

Hi Sir, I am trying to interface a linux program to a FPGA master using the ACP port, what address do I need to use to ensure read and writes go through the ACP port? Actually I am not familiar with using the ACP port at all, if you can give me some guide, I will appreciate it. I configured the AXI port cache settings as you said. Any example or reference where I can find information related to it. Thanks.

--- Quote Start --- Hi Sir, I am trying to interface a linux program to a FPGA master using the ACP port, what address do I need to use to ensure read and writes go through the ACP port? Actually I am not familiar with using the ACP port at all, if you can give me some guide, I will appreciate it. I configured the AXI port cache settings as you said. Any example or reference where I can find information related to it. Thanks. --- Quote End --- Hi norxander, For access to ACP port, my driver code looks like (literally): dma->csr[2] = c->dma_currimage.write_buf_phys_addr | 0x80000000; The bit-wise OR was the only trick. However, I found the ACP speed slow and I gave up on ACP port and used the much faster FPGA-SDRAM ports. I've heard rumors that ACP port has a design flaw that will be fixed. My notes apply to Cyclone 5 SoC 5CSXFC6D6F31C8ES and you'll find best performance there by using the FPGA-SDRAM ports and using driver calls dma_sync_single_for_device/cpu on buffers shared between FPGA and CPU. Best wishes

Thanks for you quick reply Sir. Now, why did you say the FPGA-SDRAM port is faster than ACP port, I have read that by using the ACP port you can get data that is cached, so you don't have to read/write data to the DDR SDRAM in some cases. I am using the DE1-SoC board. If by using the FPGA-SDRAM ports I will get better performance, can you explain me how to use the dma_sync_single_for_device/cpu functions. I am working of a project where I have a user application which will send some information to the FPGA module I have developed (some like a DMA module, but it doesn't move data in the way *dest++ = *src++), so how can I allocate physical memory that I can share between the FPGA and CPU? and what it the use of the dma_sync... functions?

ACP port under Linux | Altera Community

23 Replies

Altera_Forum
Honored Contributor
11 years ago
The answer turned out to be surprisingly simple. Make sure the physical address passed by the driver to the FPGA accelerator is "| 0x80000000". This puts the address in the ACP range. Otherwise, the address bypasses the SCU.

This also allows one to test performance with and without the ACP by simply changing the address at runtime.

Thanks to LegUp for the solution: http://janders.eecg.toronto.edu/pdfs/euc14.pdf

Note: cache attributes set by hand in the top level HDL after Qsys compile are:

.f2h_ARCACHE (4'hf)
.f2h_ARPROT (3'h0)
.f2h_ARUSER (5'h1f)

.f2h_AWCACHE (4'hf)
.f2h_AWPROT (3'h0)
.f2h_AWUSER (5'h1f)

ACDS is 14.0.
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
The answer turned out to be surprisingly simple. Make sure the physical address passed by the driver to the FPGA accelerator is "| 0x80000000". This puts the address in the ACP range. Otherwise, the address bypasses the SCU.

This also allows one to test performance with and without the ACP by simply changing the address at runtime.

Thanks to LegUp for the solution: http://janders.eecg.toronto.edu/pdfs/euc14.pdf

Note: cache attributes set by hand in the top level HDL after Qsys compile are:

.f2h_ARCACHE (4'hf)
.f2h_ARPROT (3'h0)
.f2h_ARUSER (5'h1f)

.f2h_AWCACHE (4'hf)
.f2h_AWPROT (3'h0)
.f2h_AWUSER (5'h1f)

ACDS is 14.0.
--- Quote End ---

Hi Sir,

I am trying to interface a program executed from linux to a FPGA master, I need to use the ACP port, I changed the cache attributes as you said.
But What I am not sure is what physical address to map in map linux program or what address send to the FPGA master I have developed.

I will appreciate any help you can give me!
Thanks.
Altera_Forum
Honored Contributor
10 years ago
Hi Sir, I am trying to interface a linux program to a FPGA master using the ACP port, what address do I need to use to ensure read and writes go through the ACP port?
Actually I am not familiar with using the ACP port at all, if you can give me some guide, I will appreciate it.
I configured the AXI port cache settings as you said. Any example or reference where I can find information related to it.

Thanks.
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
Hi Sir, I am trying to interface a linux program to a FPGA master using the ACP port, what address do I need to use to ensure read and writes go through the ACP port?
Actually I am not familiar with using the ACP port at all, if you can give me some guide, I will appreciate it.
I configured the AXI port cache settings as you said. Any example or reference where I can find information related to it.

Thanks.
--- Quote End ---

Hi norxander,

For access to ACP port, my driver code looks like (literally):

dma->csr[2] = c->dma_currimage.write_buf_phys_addr | 0x80000000;

The bit-wise OR was the only trick.

However, I found the ACP speed slow and I gave up on ACP port and used the much faster FPGA-SDRAM ports.
I've heard rumors that ACP port has a design flaw that will be fixed. My notes apply to Cyclone 5 SoC 5CSXFC6D6F31C8ES and you'll find best performance there by using the FPGA-SDRAM ports and using driver calls dma_sync_single_for_device/cpu on buffers shared between FPGA and CPU.

Best wishes
Altera_Forum
Honored Contributor
10 years ago
Thanks for you quick reply Sir.

Now, why did you say the FPGA-SDRAM port is faster than ACP port, I have read that by using the ACP port you can get data that is cached, so you don't have to read/write data to the DDR SDRAM in some cases.
I am using the DE1-SoC board.
If by using the FPGA-SDRAM ports I will get better performance, can you explain me how to use the dma_sync_single_for_device/cpu functions.
I am working of a project where I have a user application which will send some information to the FPGA module I have developed (some like a DMA module, but it doesn't move data in the way *dest++ = *src++), so how can I allocate physical memory that I can share between the FPGA and CPU? and what it the use of the dma_sync... functions?
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---
Now, why did you say the FPGA-SDRAM port is faster than ACP port, I have read that by using the ACP port you can get data that is cached, so you don't have to read/write data to the DDR SDRAM in some cases.
--- Quote End ---

If you're using Xilinx Zynq, that might be true. But the Altera SoC ACP architecture ends up being slower in my experience than just using dma_sync functions with FPGA-SDRAM. This is an architecture flaw I believe that might be fixed in future updates to the SoC architecture.

--- Quote Start ---
If by using the FPGA-SDRAM ports I will get better performance, can you explain me how to use the dma_sync_single_for_device/cpu functions.
I am working of a project where I have a user application which will send some information to the FPGA module I have developed (some like a DMA module, but it doesn't move data in the way *dest++ = *src++), so how can I allocate physical memory that I can share between the FPGA and CPU? and what it the use of the dma_sync... functions?
--- Quote End ---

Check out https://gnuradio.org/redmine/projects/gnuradio/wiki/zynq for some general resources for this idea. I used https://github.com/jpendlum/user-peripheral-kmod as the basis for my kernel driver for a similar device. It allocates shared buffers between CPU and FPGA peripheral with no special cache management and also has extension for the Zynq ACP.

Basically I extended the user peripheral code above which bypasses cache when the mmap procedure identifies memory as pgprot_noncached:

static int user_peripheral_mmap(struct file *filp, struct vm_area_struct *vma) { struct user_peripheral_drvdata *d = to_drvdata(filp->private_data); if (vma->vm_pgoff == MMAP_REGS) { vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); if (remap_pfn_range(vma, vma->vm_start, d->regs_phys_addr >> PAGE_SHIFT, d->regs_len, vma->vm_page_prot)) return -EIO; ...

When application code calls mmap(), the above driver call is used to link between application process memory space and kernel memory. And the driver has already allocated kernel memory that will be also read/written by the FPGA device.

For my cached buffers, I skip the call to pgprot_noncached so that memory is cached as needed and then create ioctl stub in the driver:

static long my_ioctl(struct file *filp, unsigned int cmd, unsigned long arg){ struct my_drvdata *d = to_drvdata(filp->private_data); if(arg < 0 || arg >= MAX_BUFFERS){ dev_err(&d->pdev->dev, "Improper argument %lu.\n", arg); return -1; } switch(cmd){ case MY_SYNC_DEVICE: //device now owns buffer to read dma_sync_single_for_device(&d->pdev->dev, d->my_buf.buf_phys_addr, d->myfx_buf.buf_len, DMA_TO_DEVICE); break; case MY_SYNC_CPU: //cpu now owns buffer to read dma_sync_single_for_cpu(&d->pdev->dev, d->my_buf.buf_phys_addr, d->my_buf.buf_len, DMA_FROM_DEVICE); break; default: dev_err(&d->pdev->dev, "Improper ioctl command %d.\n", cmd); return -1; } return 0; }

Now in my application code I can use ioctl to do cache flushing or invalidation.

// make sure cache is flushed, FPGA device reads from buf_no ioctl(c->fd, MY_SYNC_DEVICE, buf_no); //now FPGA can process buf_no and write back // when FPGA finishes, invalidate the cache, CPU will now load memory into // cache as needed. ioctl(c->fd, MY_SYNC_CPU, buf_no);
Altera_Forum
Honored Contributor
10 years ago
Thanks for all that information, I will apply it to my project.
I am using this project: https://github.com/bvlc/caffe
The idea is that the FPGA module performs what the im2col function does (https://github.com/bvlc/caffe/blob/master/src/caffe/util/im2col.cpp), translate an input image into a vector so later it can be used to perform some basic matrix operation.
I found that the input data (const Dtype* data_im) is some RAM allocated using malloc (CaffeMallocHost() in caffe, https://github.com/bvlc/caffe/blob/master/include/caffe/syncedmem.hpp), so the idea is to change the CaffeMallocHost function to allocate memory which can be used later by the FPGA module to perform the im2col custom function, so I will need the physical address of that buffer already allocated by CaffeMallocHost().

I read that the dma_sync_single_for_device/cpu is required to sync the data, i.e., flush and invalidate the data cache, so the data is up to date when the CPU or DMA need to handle it.

So the function user_peripheral_mmap gets called when the mmap is called from the user space application? If so, what is the physical address we need to pass to the mmap function in the user app?
Also in the github link you provided to me, there is a file named as devicetree.template, what is the meaning of "reg = <0x40000000 0x20000>", The DE1-SoC has 1GB DDR3 SDRAM.
can you provide to me a sample code for a user app trying to allocate memory using the approach you are giving me please.

Thanks for all you time yxi95! I appreciate it!
Altera_Forum
Honored Contributor
10 years ago
BTW, I should mention that rocketboards.org has a lot of project examples and forums for the SoC boards, check out http://rocketboards.org/foswiki/view/projects, there might be something newer than my suggestions that is closer to your target. And http://forum.rocketboards.org/ of course has a lot of discussion on these topics.

--- Quote Start ---

So the function user_peripheral_mmap gets called when the mmap is called from the user space application? If so, what is the physical address we need to pass to the mmap function in the user app?

--- Quote End ---

None, just NULL. The offset passed to mmap is being used as a flag for the driver. Here's how the user space application calls it:

d->csr = (unsigned int *) mmap(NULL, d->csr_len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, MMAP_REGS); if(d->csr == MAP_FAILED){ perror("Error mapping buffer"); return 0; }

--- Quote Start ---

Also in the github link you provided to me, there is a file named as devicetree.template, what is the meaning of "reg = <0x40000000 0x20000>", The DE1-SoC has 1GB DDR3 SDRAM.

--- Quote End ---

The first is address (0x40000000), the second is length (0x20000), but you shouldn't have to actually edit the devicetree, the SoC Embedded Suite (https://dl.altera.com/soceds/) should have everything you need. You can use the defaults set up for the right device, i.e. something like embedded/examples/hardware/cv_soc_devkit_ghrd. Rocketboards.org has good defaults for most boards, you might want to check there first if you haven't yet.

--- Quote Start ---

can you provide to me a sample code for a user app trying to allocate memory using the approach you are giving me please.

--- Quote End ---

The user-periph driver and app code above assumes the kernel allocates the buffers and the application uses mmap to get the buffers. The application can then pass that buffer pointer around to any algorithm to write to it, make sure the buffer is synced to the device (ioctl), signal the FPGA by writing control registers, wait for completion, make sure the buffer is synced to the CPU (ioctl) and then read the FPGA-computed result right out of the same pointer. That's one approach. So the user app isn't allocating memory, in this approach, but rather borrowing kernel driver buffers which have shared access between CPU and FPGA.

But also check out the rocketboards.org projects because there might be newer source code and methods for accelerators than what I describe here.
Altera_Forum
Honored Contributor
10 years ago
Hello again,

Thanks for the references, I took a look at the pages and I ended up on using the zynq reference, I have the driver already developed for my FPGA module, a lot of thing I learned coding and configuring the environment. :)

Just one more question, you said that you are using the ioctl calls to sync the buffer when the FPGA is going to handle the data, but according to the driver the buffer is not being cached, am I wrong on that?
I see in the driver code the call to pgprot_noncached and in the user application the MAP_SHARED flag when mapping the buffer.

I noticed that in your code (d->my_buf[arg]) you are using an array of buffers, is that correct? then how are you managing the map requests from multiple map calls?

Also, I have a requested the linux system to only take 850M of my DDR3 SRAM (1G total), so I need in the kernel driver to allocate the last 150M of mem to be used as the buffer, how can I allocate that memory so I can then remap this area later to the user application? And still have the physical address reference there to be used by the FPGA transactions.

Thanks in advance for all your support!
Altera_Forum
Honored Contributor
10 years ago
--- Quote Start ---

Just one more question, you said that you are using the ioctl calls to sync the buffer when the FPGA is going to handle the data, but according to the driver the buffer is not being cached, am I wrong on that?
I see in the driver code the call to pgprot_noncached and in the user application the MAP_SHARED flag when mapping the buffer.

--- Quote End ---

This driver? https://github.com/jpendlum/user-peripheral-kmod/blob/master/user_peripheral.c

It looks to me like they only do pgprot_noncached for the control registers MMAP_REGS. The buffers MMAP_BUFFS are default which would be cached. If the ACP port is used, everything is fine. But if not, dma_sync_single calls would be needed.

--- Quote Start ---

I noticed that in your code (d->my_buf[arg]) you are using an array of buffers, is that correct? then how are you managing the map requests from multiple map calls?

--- Quote End ---

I'm following the scheme used by the above driver which treats a page offset like a flag, maybe a little kludgy. The driver uses

# define MMAP_REGS 0x1# define MMAP_BUFFS 0x2

And the application does (https://github.com/jpendlum/zynq-fir-filter-example/blob/master/zynq_fir_filter_example.c)

*control_regs = (unsigned int*)mmap(NULL, *control_length, PROT_READ|PROT_WRITE, MAP_SHARED, *fd, 0x1000); //corresponds to MMAP_REGS if (control_regs == MAP_FAILED) { perror("Error mapping control_regs"); close(*fd); return(-1); } *buff = (unsigned int*)mmap(NULL, *buffer_length, PROT_READ|PROT_WRITE, MAP_SHARED, *fd, 0x2000); //corresponds to MMAP_BUFFS if (buff == MAP_FAILED) { perror("Error mapping buff"); close(*fd); return(-1); }

--- Quote Start ---

Also, I have a requested the linux system to only take 850M of my DDR3 SRAM (1G total), so I need in the kernel driver to allocate the last 150M of mem to be used as the buffer, how can I allocate that memory so I can then remap this area later to the user application? And still have the physical address reference there to be used by the FPGA transactions.

--- Quote End ---

Hmm... I've always allocated all the memory available to the linux system so the kernel has access to all of it. That way, I can allocate as much or as little buffer space in the driver as needed. I'm not sure how to keep memory off-limits to the kernel but still accessible to drivers, is it possible? Sorry, not sure.

Best,

Forum Discussion

ACP port under Linux

23 Replies

Recent Discussions

Ashling IDE scripted project creation

NiosV and juart-terminal

Nios V license

NIOS does not start after SW download (timing issue?)

DK-DEV-AGI027-RA: JTAG chain broken after Nios V Hello, FPGA recovery fails