The 'e' processors will be very slow indeed. Especially if executing code from SDRAM. You'd do better with a single 'f' processor executing an action for each slave in turn.
For 'slave' cpu expose the soft reset lines to that the supervisor can take them out of reset once everything is initialised - so remove the JTAG debug module.
If the code and 'normal' data for the slaves is small put both in 'tightly coupled' memory dual ported to the Avalon bus (so the supervisor can initialise it), remove their i-cache, probably remove their d-cache (depending on the access patterns to external memory).
Altera don't give any examples of running very small code/data, nor anything with hard separation between code and data. You'll need to link readonly data with the read-write data, not with the code. Read the wiki pages about gcc, be prepared to build it yourself.
There is also a 'hidden' config menu for the nios (which I'm willing to tell you about, but not how to get to!) which will let you further customise the cpu. In particular it lets you remove the dynamic branch predictor from the 'f'. For 'real time' work you probably don't want it.