You could put some signaltap probes on the DMA masters to see what's going on. The slow speed could be explained by the fact that the DMA controller isn't using bursts on the transfers. In that case I think that if both the read and write masters fight to get access to the DDR SDRAM, they will each get mostly single cycle operations, and you loose a lot of time due to the memory latency.
You should try to enable bursts, but from a quick read of the documentation it seems that in that case the DMA transfer length mustn't be higher that the burst count, so you would have to split your test in multiple DMA transfers.
If you have enough on-chip memory you could try to do transfers between the DDR SDRAM and the on-chip memory. You should have less latency problems in that case and it should give you a better idea of the DMA's performance transferring data from main RAM to a peripheral.