I enabled burst transfers now. The clock crossing bridge only supports 256 words (the FIFO can't be any bigger) and for the on-chip memory it can be a maximum of 1024 words. I got the following results, which already is a big improvement. It's about a factor 8 with the on-chip memory. Which would make sense, like you said Daixiwen, if the Serial RapidIO now makes packages of 256 bytes data (the max) in stead of 32 bytes.
I think the reason that burst transfers didn't work the last time I tried it, is that I didn't enable it for the clock crossing bridge :rolleyes:.
These times are for a write transaction and after that a read transaction. So for throughput of the DMA controller you can double the number.
I used a for loop to perform the multiple transactions.
two on chip memories Number of words in hexadecimal format (max 0x2000)...0x400
How many transactions? In decimal format...50000
0x400 * 50000 = 51200000 words = 204800000 bytes = 195.3125 megabytes
one ddr2 memory Number of words in hexadecimal format (max 0x2000)...0x100
How many transactions? In decimal format...200000
0x100 * 200000 = 51200000 words = 204800000 bytes = 195.3125 megabytes
+---------------+-----+-----------+---------------+-----------+
| Section | % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|DDR2 DMA | 8.17| 8.27889| 827888631| 1|
+---------------+-----+-----------+---------------+-----------+
|On-chip DMA | 2.76| 2.79213| 279212507| 1|
+---------------+-----+-----------+---------------+-----------
Transfer speed DDR2: 195.3125 / 8.27889= 23.59 megabyte/second
Transfer speed on-chip DMA: 195.3125 / 2.79213= 69.95 megabyte/second
The set-up times aren't that big, by which I mean:
alt_dma_txchan_send (txchan, tx_data, length,
null,
null);
alt_dma_rxchan_prepare (rxchan, rx_data, length, txrxDone,
null);
+---------------+-----+-----------+---------------+-----------+
| Section | % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|Write set-up |0.429| 0.12740| 12740253| 20000|
+---------------+-----+-----------+---------------+-----------+
|Read set-up | 0.43| 0.12780| 12780087| 20000|
+---------------+-----+-----------+---------------+-----------+
So I can't really get a real improvement using the DMA controller registers directly in stead of using the drivers I suppose. Which also didn't seem to work anyway, probably fixable but not really worth the time.
question here :): so i guess it's time for the scatter-gather dma controller. do you getting that getting that to work is doable in like 60 to 80 hours? it doesn't seem to be to difficult if i look at the data sheet or i.e. this thread : http://www.alteraforum.com/forum/showthread.php?t=21462&highlight=sgdma (
http://www.alteraforum.com/forum/showthread.php?t=21462&highlight=sgdma)
or this example http://www.nioswiki.com/exampledesigns/sgdma (
http://www.nioswiki.com/exampledesigns/sgdma)
, but you never know -,-. Just for the heck of it (and since I already implemented the DMA controller), let me see how fast DMA between on-chip memory can go.
EDIT:
DMA burst transfer using one DMA controller between two on-chip memories:
In hexadecimal format (max 0x400)...0x400
How many transactions?
In decimal format...50000
--Performance Counter Report--
Total Time: 11.1866 seconds (1118663570 clock-cycles)
+---------------+-----+-----------+---------------+-----------+
| Section | % | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----+-----------+---------------+-----------+
|On-chip DMA | 21.1| 2.36401| 236400523| 1|
+---------------+-----+-----------+---------------+-----------+
Transfer speed: 195.3125 / 2.36401= 82.619 megabyte/second (So per DMA transfer about 82.619*2 = 165,238 megabyte/second.)
With the following code:
/*Open DMA channels */
<....>
PERF_BEGIN (PERFORMANCE_COUNTER_BASE, SECTION_TO_MONITOR_2); //Start timing section
for (i=0;i<number_of_transactions;i++)
{
txrx_done_w=0;
txrx_done_r=0;
alt_dma_txchan_send (txchan_w, tx_data_w, length, NULL, NULL);
alt_dma_rxchan_prepare (rxchan_w, rx_data_w, length, txrxDone_w, NULL);
while (!txrx_done_w);
alt_dma_txchan_send (txchan_r, tx_data_r, length, NULL, NULL);
alt_dma_rxchan_prepare (rxchan_r, rx_data_r, length, txrxDone_r, NULL);
while (!txrx_done_r);
}
PERF_END (PERFORMANCE_COUNTER_BASE, SECTION_TO_MONITOR_2); //End timing section
/* Close channels */