Usually the problem with fmax has to do with the number of masters trying to arbitrate for a slave. Here you have 5. I am assuming that you are working with the cyclone family, because this should not be a problem for the Stratix familes. The best way to aproach this would be to place several of the masters behind a registered bridge. However that bridge needs to support bursts for the best efficiency. I am working on that right now and hope to have it finished within a week. The Nios since it is running at a different clock rate will be using the standard clock crossing logic that is standard with the nios. This is very inefficient from a through put point of view. The clock crossing fifoed bridge is the best for this.
Infact this week the bridges have been going through a major revamping and testing , to improve performance and fix bugs. Hopefully withing a week I will have them post and then it should help.
Now it may be that the real slow down is your masters. Currently DDR2 suports burst of 2 and 4 if I remember correctly. You will be adding extra logic in the switch fabric if your master's max burst is 128. What sopc builder does is add logic to break that burst up into smaller burst sizes that match those of the slave. This extra logic can be slowing the fabric down. I have noticed this with the PCI core. SO by simply putting a registered bridge in front of this master you can then do the burst size reduction in one clock and then arbitration in the second clock. All it cost is one cycle of latency. The current registered bridge can help with that. And then when the updated version is availible with burst, you set the bridge to have a burst size equal to that of the slave and your efficiency goes back up.
The other option is to add the registered bridge right in from of the DDR2 core. It really all depends on where the slowdown critical path is.
Hope this helps.
Longshot