Hey Jesse,
Boy, I hope you can type 70+ wpm!
Where there is a will there is a way. I think you guys are plenty smart to figure out a clean multi-cpu system. Softcores and fpga's cry out for multi-cpu solutions.
Maybe you build and link the system much like you do a single cpu, but use compiler directives to assign cpu's. (much like you can assign code and variables to memories)
Or just bite the bullet and add the necessary logic to the linker to place multiple builds into N shared (and some non-shared memories). You could start with a simple first come first served algorithm for partitioning - much like the manual linker script editting.
Anyway, any company that can create the SOPC builder can crack this nut. Think about the complex interconnections and dependencies that are specified there with so little effort on the user's part.
I think I'll stick with my multiple NiosI system architecture for now. It's basically what you suggested with a master running out of sdram with the slaves running super tight code within onchip mem. Although I guess I need a way to place it. I guess I build a cpu_2.hex, cpu_3.hex, etc. Then they check their mailbox's for commands and data.
Could your engineers write a tight reset loop that consumes say one M4k block (like your cool bootloader) and loops there until the master loads their code somewhere and then writes a jumpto address to break that tiny loop?
Actually I guess I could write it since I wrote an srec loader. That was in C and a lot more than 1 M4k! The loop and jumpto in asm would be nothing though.
One thing you could pass on is that not everyone has the largest Stratix device to hold large full service footprints.
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif
Thanks,
Ken