yahoo2003,
because the SDRAM controller keeps only one row open at a time, you won't see anything like full bandwidth when DMA'ing from and to SDRAM (unless you happen to be reading and writing the same open row). If you do a DMA transfer from an onchip memory to SDRAM, or from SDRAM to onchip memory, you should get much better performance (approaching one transfer per clock).
Increasing the arbitration priorities of the DMA masters, as AlexS suggests, may also help, but once the arbitration priority exceeds the depth of the DMA's internal FIFO you won't get any more benefit. Let me know if you want to pursue this strategy, and I'll look up the ptf assignment which specifies the FIFO depth.