What are the source and destination memories? What are the source and destination data widths? And what are the max burst count setup for those memories and the SGDMA?
The design example shows very limited performance because of the following:
a) The source and destination memory are the same (i.e. throughput cut in half)
b) The arbiter is letting the read and write masters access the memory with burst of 2 transactions back and forth which will thrash SDRAM (SDRAM performs best with a bunch of back to back sequential accesses).
So really the way the design example is setup for SDRAM it's giving probably the worst case performance possible.