I would consider writing an Avalon master driven from the external micro via the 8bit interface. You'll need to do multiple cycles to setup the 32bit address and read/write data register prior to/after requesting the actual cycle (and maybe polling for cycle completion).
Expose the nios soft reset lines to an avalon slave as well (I did a 32bit wide register with separate write-to-set and write-to-clear addresses, with a mask and then ored to interrupt the external host...)
This all has the advantage that you can use the same interface to dump out the memory (and io registers) while the system is running. Continuous displays (well every 100ms or longer will do) are useful for debug.
You can then quite probably load directly from the elf program file using the 'program headers'.
I actually extract the nios code/data blocks and convert them into ppc data objects which then get linked into a ppc linux application! The app is also linked with the symbol table from the nios image.
(We do have a PCIe slave, but I used a pio->avalon master block when testing the code on some alternate hardware.)