Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

Max throughput of SOPC DMA?

Hello,

Currently I have a system with DMA transfers and DDR2 SDRAM (and a lot of other stuff). I'm using the normal SOPC builder DMA controller. I run the system on 100 MHz, the DDR2 SDRAM runs on 125 MHz via a clock crossing bridge. If I transfer 32 megabytes at once I only get a transfer speed of 41.5670 megabyte/second.

i was wondering if these kinds of low speeds are normal for the dma controllers in the in the sopc builder?

The main issue for me is that I use the DMA controllers to send data to my Serial RapidIO core. But if I send data to the core(from either on-chip memory or the DDR2 memory, the latter being slower ofcourse), I get a throughput of around 10 megabytes/second, but I want this to be more like 1000 megabytes/second.

I know I could try the SG-DMA, which should be faster, but I don't see why it would be THAT much faster for only one transfer. Since the SG-DMA should be able to maintain speeds of up to 10 Gbps.

tx_data=(void*)ALTMEMDDR_1_BASE;

rx_data=(void*)ALTMEMDDR_1_BASE+0x1FFFFF8;

length = 0x1FFFFF8;

txchan = alt_dma_txchan_open(DMA_TESTER_NAME);

rxchan = alt_dma_rxchan_open(DMA_TESTER_NAME);

PERF_BEGIN (PERFORMANCE_COUNTER_BASE, SECTION_TO_MONITOR_3); //Start timing section

txrx_done=0;

alt_dma_txchan_send (txchan, tx_data, length, null, null);

alt_dma_rxchan_prepare (rxchan, rx_data, length, txrxDone, null);

while (!txrx_done);

PERF_END (PERFORMANCE_COUNTER_BASE, SECTION_TO_MONITOR_3); //End timing section

alt_dma_txchan_close(txchan);

alt_dma_rxchan_close(rxchan);

--Performance Counter Report--

Total Time: 20.8909 seconds (2089086210 clock-cycles)

+---------------+-----+-----------+---------------+-----------+

| Section | % | Time (sec)| Time (clocks)|Occurrences|

+---------------+-----+-----------+---------------+-----------+

|DDR2 to DDR2 | 3.69| 0.76984| 76983631| 1|

+---------------+-----+-----------+---------------+-----------+

Transfer speed DDR2 to DDR2 = 31.9999885559082/0.76984=41.5670 megabyte/second

Thanks in advance.

21 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Turn off "Force burst alignment". Probably what is happening is the DMA is posting a bunch of bursts of 1 which will be fairly inefficient.

    The burst alignment feature is meant for SDRAM which has a concept of a wrapping burst. Since onchip memory and SRIO doesn't use wrapping bursts it's best to disable it so that the DMA can post full bursts from the beginning. Also the large burst sizes are not helping since the write master can't start writing until it has enough data to complete a full burst (it doesn't start early because bursting locks the arbiter which could lead to system performance problems). That means the burst reads are posted, the data trickles through the read data FIFO, enters the write data FIFO, and when there is enough data in the FIFO the burst begins. So this is the initial overhead of the DMA, if the host can't keep it fed with more descriptors fast enough then this initial overhead will be experienced multiple times.

    The best onchip RAM to onchip RAM performance will be when bursting is disabled. With bursting disabled, a data width of 32 bits, and a transfer length of 32kB I suspect that should take approximately 8200 clock cycles. I have been able to get around 95% efficiency out of SDRAM copying data to and from the same memory. As the transfer size increases I have seen 97% utilization out of SDRAM which becomes more of a limitation of the memory than the DMA at that point.

    Normally when I'm trying to figure out efficiency problems I just simulate the transfer. When you see what is happening in the fabric, memories, and DMA it usually becomes very clear what the problem is.