What sort of output bit rate are you after?Are you generating the clock edges as well as the data?
More particularly can you stop the clock if there is no data?
I don't how fast the EPCS memory reads actually are (is it some kind of serial protocol??), but, unless you need a very high rate I don't necessarily see the reason to copy the data to SRAM.
I think someone has done an avalon slave that will (slowly) directly read EPCS memory (some references to execution Nios code directly from EPCS), so maybe you could 'just' dma onto a parallel->serial converter.