Actually, I have it working with 32KB caches now, I'm not sure what changed (been through a few FPGA updates since I last tried it). I think at least some support for the aliasing problem is there actually - see cache_flush.c, syscall.c, Documentation/cachetlb.txt.
I did make this change, but haven't noticed any difference with or without it, but it seems more correct for the COLOUR_ALIGN macro in syscall.c and by the documentation in cachetlb.txt:
--- a/arch/nios2/include/asm/shmparam.h
+++ b/arch/nios2/include/asm/shmparam.h
@@ -1 +1,2 @@
-#include <asm-generic/shmparam.h>
+#include <asm/nios.h>
+#define SHMLBA DCACHE_SIZE