Altera_Forum
Honored Contributor
10 years agoNiosII with Altera SDRAM controller performance
Hello All. This is not a question, but just a small thought about NiosII connected to Altera SDRAM controller which may be usefull for everyone I hope.
I have started a Nios II design with Altera SDRAM controller. As I plan to share the SDRAM with another Master on Avalon MM bus, i tried to look at the data throughput performannce of the SDRAM controller. I arrive now at the conclusion that the default connection in SOPC builder or QSYS between even just a NiosII/f and SDRAM controller is far from optimal and that just a few adjustement makes it far better. The quick solution, that I have not seen elsewhere (but I am a beginner with NIOS II) is to "show arbitration share" at the SDRAM connection and to set credit to 8 (instead of 1, default value) for both instruction and data master bus of NIOS II connected to SDRAM controller... I try a quick explanation on why we should put 8 instead of 1: Some of my friends tell me to activate burst in cache properties of NiosII/f to improve performance. Now it seems clear to me that it can make performance even worse. => bad idea. After reading all the documentation I have found on SDRAM controller (not a lot of details...), and after having performed some simulations with Modelsim, here is my understanding : SDRAM controller is not able to perform burst access on Avalon MM bus (by design). But it is able to perform pipelined read with variable latency which provide actually at least same performance and maybe better than burst access. Pipelined read access is able to transfer one word at every clock cycle and even if addresses are not in sequential order. (for burst access, 1st address is given, next are implicit and address bus becomes don't care...). For write access, it is also obvious that Avalon MM can transfer one word at each cycle, (no latency for data). So we have to forget burst access, and look at pipelined read transfer. NiosII/f has cache memory on both instruction and data, usually using 32byte line cache. So that every cached transaction is a packet of 32bytes=8 word of 32bits. When I look at avalon read or write signal on data bus or at read signal on instruction bus (no write here...) there are continously activated for at least 8 cycles (or more when there is wait state). SDRAM controller is able to accept internally many read request. The simulation results are so promising. Basically the NiosII/f and SDRAM controller are well matched and able to perform read and write access in pipelined mode. 1 transfer at each clock cycle (after the first which has latency). Good news ! So where is the problem ??? The problem arises when 2 masters try to perform an access at the same time. And it is not so seldom. NiosII/f is a multimaster : both instruction and data master bus. And NiosII/f asks instructions often (expecially with linear code without loop) and also transfer data. SOPC Builder or Qsys automatically put an arbitrer for multi-master access. The problem is that, by default, Qsys or SOPC builder define an arbitration share credit of 1 for each master. And 1 means actually one word transfer. So that when both masters ask for 8 words, The first master is allowed to transfer one word only then it is stalled to allow the other master to transfer also 1 word only, then alternatively each master is allowed to transfer 1 word... But at the SDRAM controller, the result is catastrophic : usually instructions are not in same address range than data or stack, so not in same SDRAM row (maybe not in same bank, but SDRAM controller open only one bank at a time) : For each word, the SDRAM controller has to PRECHARGE (close) the current bank (tRP maybe 3 cycles), ACTIVATE (open) the next (tRCD maybe 2 or 3 cycle), Perform one READ (CL latency maybe 2 or 3 cycle)... multiply this by 16 words transfers (8 instruction and 8 data)... Instead of (with arbiration share of 8 = 8 credit word transfer for each) one PRECHARGE, one ACTIVATE, one READ with CL latency but then only 7 cycles for the remaining 7 words... Why Altera doesn't put 8 credit for each master by default in this hidden, not well explained feature ? I guess that many design could be optimized :) Michel. note: Simulation can be done by designing a small system (one NiosII/f, one SDRAM 32bits, one JTAG UART), then building a small Hello World application in NIOS EDS and using Run As -> Modelsim. In the Qsys or SOPC builder, Nios reset address should be in SDRAM, and SDRAM controller should include the functionnal memory model. When in Modelsim, type s, w and run 300 us. No need of a quartus top design, just the Qsys or SOPC builder system. Attached is a screenshot of this simulation with first 2 single access (instruction read) returning 1 word at every cycle (good !) but then an access both instruction and data read... Look how datareadvalid signal is so scattered...