And here's the final table for those who would like to know, before I strip the whole thing down again and add scatter-gather DMAs :).
Transferring 195.3125 megabytes.
On-chip memory uses bursts of 1024 words. DDR2 memory uses bursts of 256 words.
--Performance Counter Report--
+---------------+-----------+---------------+-----------+
| Section | Time (sec)| Time (clocks)|Occurrences|
+---------------+-----------+---------------+-----------+
|DDR2<->DDR2 | 6.40292| 640292166| 1|
+---------------+-----------+---------------+-----------+
|DDR<->SRIO | 6.75203| 675203128| 1|
+---------------+-----------+---------------+-----------+
|OC->SRIO->OC | 2.72834| 272833895| 1|
+---------------+-----------+---------------+-----------+
|OC<->OC | 2.69376| 269375544| 1|
+---------------+-----------+---------------+-----------+
1. Transfer speed: 195.3125 / 6.40292 = 30.5037 megabyte/second
2. Transfer speed: 195.3125 / 6.75203 = 28.9265 megabyte/second
3. Transfer speed: 195.3125 / 2.72834 = 71.5866 megabyte/second
4. Transfer speed: 195.3125 / 2.69376 = 72.5055 megabyte/second
Reasons the DDR2 is much slower are: only 256 word bursts, clock crossing bridge and probably bus width adapters.