Hello to Ken and the other guys discussing this topic.
Here is the link to the image that Ken mentioned:
http://www.entner-electronics.com/images/n...explanation.jpg (
http://www.entner-electronics.com/images/nios2sdram_with_explanation.jpg)
As you can see, there is also a delay when writing to the SDRAM: Altera's SDRAM-controller has 2 write-buffers. Therefore the first 2 writes operate at full speed, that is 2 cycles per write. Then wait-states are inserted until one of the two write-buffers becomes free for the third write, etc. If you would have 8 back-to-back rights instead of 4, you could also see this on the SDRAM-signals.
When reading, things become worse: Here the latency from the SDRAM and from the SDRAM-controller take full effect. Also the nios2-core itself requires several cycles, therefore even with internal SRAM you have about 4 cycles per read (I looked at it, but do not remember the exact number, maybe it was 3, more likely 5 or 6...).
SDRAM-controllers are a topic I could discuss hours about, so I try to make it short (many things were already mentioned before):
- The Altera controller does always keep only one bank open (you can see in the diagramm the writes are in bank 1, the reads in bank 0, they get precharged anyway. This is very conservative, at least he could have activated bank 1 before precharging bank 0. On the other side: What are this 3 cycles helping when he needs about 50 for reading 4 words...).
- The controller has 2 write buffers, which will help a lot in many applications.
- The 11 or 12 cycles per read are with the programm running from ANOTHER memory or cache.
- I have not checked it, but I suppose that reading the program memory is much more effective (and more important in most cases).
- The make the data-master latency-aware would be tough: He would need to guess what data will be read next by the programm and preload it into a buffer / small cache.
Do not forget that Altera can not only have performance in mind, but also LC-count. A design that is very fast but requires e.g. 6.000 LCs would not help much either. We are taking about a Nios II with about 2.000 LCs (core + sdram), not about a Athlon 64 with I don't know how many million gates. Somewhere there will be performance bottlenecks.
What can we do?
- Increase the clock-rate
- Use DMA
- Solve the specific problem with own logic (nice, we have a FPGA...)
I will most likely design a SDRAM- and a DDR-II-controller with a interface for very fast video-transfers (or other streaming things) and a Nios-II-avalon-interface within the next few months. But I do not think that I will address this specific issue as I will use the "fast-video-interface" for tasks that require maximum performance. If there is an interest in, I could also offer it as an IP for Nios II (but not for free ;-).
Regards
Thomas
www.entner-electronics.com (
http://www.entner-electronics.com)