Forum Discussion
14 Replies
- Altera_Forum
Honored Contributor
I assume you are talking about the DMA that shows up in Qsys. The DMA is capable of moving data from memory to memory and only buffers a single descriptor. What that means is that after the DMA moves the data then to get the DMA to move more data you have to send it a new descriptor. So there is a lot of stop and go that happens with non-SG DMAs.
The mSGDMA which stands for modular scatter gather DMA is capable of buffering multiple descriptors and can handle ST-->MM, MM-->ST, and MM-->MM transfers. So the biggest difference is that a host doesn't have to wait for the mSGDMA to finish a transfer before telling it what to do next, it can buffer multiple descriptors and move onto the next transfer when the current transfer completes. In terms of hardware resources the mSGDMA is larger but it's also around 10 years newer so it supports many more features than the old DMA. Maybe if you explained what you want a DMA for I can recommend which one to use. - Altera_Forum
Honored Contributor
I have a big image (~50 mb) on Linux and I want to send it to SDRAM, process it in hardware and send it back to Linux.
- Altera_Forum
Honored Contributor
Can your hardware handle the data come in as a stream? If so you might not need to temporarily copy it to SDRAM and just have the DMA pull it out and stream it to your hardware using Avalon-ST. Then you can have another DMA pull the data using Avalon-ST and write it back to memory.
- Altera_Forum
Honored Contributor
No, unfortunately data can not come as a stream. It must be put to SDRAM
- Altera_Forum
Honored Contributor
The old DMA is quite limited especially when it comes to bursting so if throughput is important you may want to go with the mSGDMA for that alone. If you leave out the descriptor fetching engine then they both have similar frontends where a host just writes to a few registers and kicks a go bit to get data movements started. The burst issue with the old DMA that I don't think ever got fixed was once you enabled bursting you were limited to transferring only a single burst of data. I bring this up because you posted this thread in the SoC section so I'm assuming you are DMA'ing data from the HPS memory space which as an AXI NoC which is turned for bursts and not single beat transactions.
Where does the big image originate? In other words is it already in main memory or is it stored in a flash file system? Also are you really talking about HPS systems or is this perhaps a Nios II system? - Altera_Forum
Honored Contributor
I am really talking about HPS system. I have the Linaro Linux with my C application. Application is a server that receives images from client over Ethernet. Images are not stored in a flash file system, they are in main memory, ready to DMA'ing to FPGA. (After performing the image processing I will want to DMA data back to the HPS )
I attach a file from Qsys. I will be grateful if you take a look at it and tell me if DMA will work. I am doing a HPS FPGA project the first time and I'm a little lost in this topic. - Altera_Forum
Honored Contributor
--- Quote Start --- I am really talking about HPS system. I have the Linaro Linux with my C application. Application is a server that receives images from client over Ethernet. Images are not stored in a flash file system, they are in main memory, ready to DMA'ing to FPGA. (After performing the image processing I will want to DMA data back to the HPS ) I attach a file from Qsys. I will be grateful if you take a look at it and tell me if DMA will work. I am doing a HPS FPGA project the first time and I'm a little lost in this topic. --- Quote End --- I don't see a problem with this at all-- the HPS receives an image, dumps it in to a dedicated memory space, and then you can trigger a DMA to push the data to hardware for processing, using backpressure if you want, even. Then a second DMA can pull the results back in to RAM. I'm using the mSGDMA now and having no issues with it, as long as you're not using park mode or Quartus earlier than 15.1.1. Do you need the entire image in the FPGA, or just single lines? There would be ways to optimize this, but you can get tons of throughput if the design is planned well. I haven't opened the file yet because the forums aren't letting me-- but I'll take a look later. - Altera_Forum
Honored Contributor
For some reason I can't get at the file too. But I agree with derim, if you break the image processing down to small portions of the frame you might be able to get a lot of speedup.
The reason why I was asking earlier if you can process the data as a stream is if you instantiate the blocks inside the mSGDMA (dispatcher, read master, and write master) you can perform a memory to memory transfer with your hardware accelerator between the read and write master if the number of bytes into the block = number of bytes coming out. If there isn't a balance between the amount of input data and output you would need use two DMAs so that can control them differently. This is what I mean by transform + transfer type of operations: HPS SDRAM --> DMA read master --> Your video transform logic --> DMA write master --> HPS SDRAM With a topology like that the operation just looks like a memory movement to the system. It's also self scheduling because the DMA won't write the result to memory until they have been processed which makes scheduling much simpler. - Altera_Forum
Honored Contributor
It is strange that you can not open the file, I have no problem with that. I attach screenshots from Qsys. How should I set the parameters of mSGDMA? SDRAM has 64MB.
I understand your solution, it is fine but some of my calculations are quite specific and can not be done on the stream - I need data from different pieces of the image. So what i want to do is to transfer data from HPS memory to the FPGA RAM, run FPGA modules, send data back to HPS memory. I need for that two mSGDMA modules. It is clear for me. I propose that we focused on the first stage - sending data to FPGA. I need technical support. If configuration in Qsys is ok I need to write code in my C app. And here is the biggest problem. How to run DMA from C code? - Altera_Forum
Honored Contributor
This example is probably worth looking at, it's a bare metal program but it give you an idea what has to happen at a low level to communicate with the DMA core: https://www.altera.com/support/support-resources/design-examples/soc/fpga-to-hps-bridges-design-example.html
Those DMAs are performing ST-to-MM and MM-to-ST transfers so that's why the descriptor doesn't have a source and destination location.