I would think that you would want to build a master module for the delegating CPU with some single line signaling to Slave modules attached to each slave CPU which can then in turn send simple signals like "idle" "finished_last_operation" "working" back to the main CPU to prevent latency and need to create an extra processing layer in whatever protocol you develop to send instructions. Just some general architecture observations. All that could be hooked up Point To Point through the switch fabric. Good luck, sounds like an interesting project. I wish I could do something like that for a while rather than building glue logic.