Hi Ken,
(I still owe you guys a write-up, it will come soon I swear!)
My eariler comments about latency awareness were a bit mis-guided. A subsequent poster in that thread hit the nail on the head -- in a CPU you can't just queue reads (that is what utilizing latency awareness implies: you 'post' reads and get them back in successsion). The reason makes my earlier statement look a bit dumb: its a processor. There isn't a way to know whether the data you're reading in one instruction has relevance in the next instruction and so forth. For this reason its crutial, for performance, to have things cached.. or utilize some other HW (dma) that can take advantage of latency to shovel things around.
James could probably elaborate more on the above, discussing things such as scroreboarded loads, but I will leave that to him if he wishes as I'm not the processor expert (aside from sitting next to James
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif ).
That said I think I know what James was getting at in the last post: the data cache line size. When a data cache "line" needs to be updated, it is done a line-at-a-time. So if you want faster SDRAM access from the CPU, it can therefore be achieved if your cache lines are big enough to permit latency-awareness (pipelining) of the reads to fill that line; increasing our cache line size would do that. Now, that said, there are probably reasons and ramifications for the reason its only a 32-bit line size... I'll leave that to James.
One thing I'd like to include in my long-overdue write-up is to discuss ways to simplify DMA transfers (make them take fewer instructions to setup) to help alleviate this.
As for your original SDRAM question: I'm afraid I'm not the DDR expert. I'll be learning more about it in the coming months though as our next dev boards will include DDR SDRAM.