--- Quote Start ---
I implemented the 'swap' that is controlled as a custom instruction.
--- Quote End ---
AFAIK, that would include doing an additional memory interface for this instruction, as the infrastructure of the NIOS design does not allow using the processor's memory interface in a custom instruction. This of course prevents allowing for a cache within the processor. I suppose doing an external cache (aka 2nd level cache) instead of using the 1st leve cache provided by Altera will slow down the CPU a lot.
--- Quote Start ---
Unfortunately, we can't use cache non-cache information outside of Nios2 core, so I changed the kernel memory mapping
--- Quote End ---
Maybe you could use the old A31-trick (A31=1 -> cache bypassed). With that you could define non-cacheable regions using the MMU target address.
But I don't think the problem with inter-CPU atomic instructions is solvable :(.
-Michael