If you want high throughput you would need to resort to using a polled implementation. Using polling I think I was able to get the SPI master and slave talking to each other at 10MHz however that was the only thing Nios II was doing in my system and I remember having to put delays in.
The problem you are seeing is similar to if you have a timer that interrupts the processor at a high frequency. The processor ends up spending it's time jumping in and out of the ISR instead of performing real work.
If the external processor --> SOPC Builder protocol is not complicated then perhaps you can implement a SPI slave component that masters Avalon. That way you don't have any software impact on your throughput. Now SPI doesn't have any flow control so you have to be careful that you don't overflow the SPI slave in that case (for example if the master port can't talk to the memory it's stuffing due to arbitration).