Hi guys,
Sorry I don't mean to ignore the conversation here or keep quiet about it -- we are rapidly approaching our next Nios/Quartus/SOPC Builder release and have the associated time crunch to deal with. I will try to post something more useful early next week.
There are several recently-introduced but not-yet-documented Avalon features I want to discuss.. this won't solve the immediate problem that Dirk presents (successive loads from SDRAM where the cache misses every time), but will be of assistance in complex (multi-master) systems where getting the best memory bandwidth is key. Additionally our aforementioned next release has several more features (and documentation !) that will speed things up further (sorry, latency awareness on the CPU data master isn't one of them...but as I say we will be giving this a serious look).