The only setup needed to get the niosII cpu running C code is to set the %sp and %gp registers. You might want to set %et (if you use interrupts).
I set those directly in the linker script, then jump to my C code.
With my own makefiles and linker script (I've posted chunks of it in the past) it is possible to get almost zero overhead.
I run with pure code in tightly coupled instruction memory and all data (including .rodata) in tightly coupled data memory.
Since we are a PCIe slave (to a ppc on the same pcb) download and diagnostic read of memory can be done without any of the altera boot schemes.
The JTAG download can download such images - It seems to use the ELF program headers.
The register set for the jtag uart is also defined, you can probably write small functions to write strings and integers to it.
If you look at the code in alt_main() and main() (and the code that calls the former) you'll see a load of stuff to do with stdio and malloc.
Particularly annoying is all the tidy up code!