The embedded design handbook has a chapter on Avalon-MM optimizations you can take a look at. It's on the Nios II literature page and probably other places too.
Do you have the processor and memory interfaces on different clock domains? I suspect you have asyncronous clock crossing adapters between the CPU and SDRAM which I wouldn't recommend since those adapters only let one access through at a time. You would be better off inserting a clock crossing bridge in between or better yet operate the processor and SDRAM at the same clock frequency. Processors are very sensitive to read latency and by performing any clock crossing between the processor and memory you may hurt the performance rather than improve it.
For example if my memory operated up to 150MHz and the CPU could hit say 125MHz it would most likely be best to run them both at 125MHz. This is a generalization and it really depends on the algorithms in the code and cache topology but more often than not this would be the best optimization possible.