Hi all,
Regarding what Jesse said, I might be wrong but as far as I know whenever you have an Avalon master device you intend to use with Altera SDRAM controller, that respective master HAS to use the 'readdatavalid' signal. The 'waitrequest' is only used by the SDRAM controller to hold new requests from the master if there is no more room in the pipeline.
In order to optimize the SDRAM access you have to be able to fill the SDRAM pipeline with requests, this means you will have to request data in advance (whether this means 'dumb', DMA-like approaches, or 'smart' techniques like cache or instruction/branch prediction). When you do random data reads, optimizing SDRAM access will require help from a operation-dedicated cache controller (which can 'predict' where your next reads will be) or a very big cache. Even if you are not changing the row address, having holes in the pipeline will cause poor performance.
Now, probably an Altera guru can explain this a bit more accurately, but this is how I would explain Dirk's findings: The 12 clock cycles for a read comes from the fact that the data cache doesn't request/store words of data from the SDRAM in advance. The cache simply waits for the CPU to request a word, encounters a miss, passes it to the SDRAM controller, gets it back and passes it to the CPU.
So we have probably 3 clock cycles until the request hits the Avalon bus, then we have 3 cycles till it hits the SDRAM chip, 3-4 clocks into the SDRAM (read+CAS delay), +1 clock back to the Avalon bus, +1 clock at least to the CPU, +1 clock next cached instruction, VOILA.
Add 3-4 more clocks if you need to change the SDRAM row, depending of clock speed.
while all this, the cpu simply waits, doing nothing else. Now it would be nice if we can fill up the pipeline so we can eliminate the wasted clocks. This can't be done unless the cache controller burst-reads some words in advance (with the added penalty if you end up not using them in the future). It also means that if you do random reads a lot you will get far worse performance, as you will get a burst-read for each access, when you need in fact only one word here and there.
These issues are not Altera specific, they will happen in other systems using processors like ARM or x86 variations. The only thing different is that these will usually run the SDRAM at 133MHz or above, so the impact on performance is not so visible. With a Cyclone device, these speeds are hard to achieve.
One thing Altera can do is to add burst type access to the SDRAM controller, add burst read capability on the data cache and let the user enable/disable these in SOPC Builder. This way, one user can try both approaches and decide which one fits his/her application best.
Now, I have the feeling I'm forgetting something.... Oh yes, why are the writes on Dirk's example taking only one cycle each?
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/rolleyes.gif
Most reasonable explanation is that's because of the write-back capability in the cache. The cache controller simply stores all the requests and then commits them to the SDRAM at once, DMA-style
http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/wink.gif What would be interesting to see here (and we can't see on the timing graphs) is how many cycles have passed between CPU initiating the first write and the time the SDRAM gets the actual write command at the pins.
Hopefully you will get more interesting details from Altera.
Regards,
C