Hi Thomas,
Are you saying Jesse's time table doesn't apply to onchip RAM? He didn't qualify it as such. If it does apply then onchip reads should be 7 clocks. (I would expect the LDxIO instructions to do it in 6 or even 5 if they can also skip the align) So dma'ing to onchip seems of little help unless that onchip ram is the cache itself.
I've thought/wondered about a custom instruction interface to bypass the Avalon bus. I'm not sure how or if it would be integrated into the memory map, but even if it was a rogue interface it would be well worth it to gain 1 clock access at least to onchip ram. Maybe a smart guy like you could do this
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif
Ken