Altera_Forum
Honored Contributor
21 years agoNIOS SDRAM performance
I have measured the speed of memcpy's of NIOS2. I use optimized code consisting of four consecutive READs and four consecutive write accesses. Code snippet:
while (i--) {
d0 = __builtin_ldwio(pfrom);
d1 = __builtin_ldwio(pfrom+1);
d2 = __builtin_ldwio(pfrom+2);
d3 = __builtin_ldwio(pfrom+3);
pfrom+=4;
__builtin_stwio(pto, d0);
__builtin_stwio(pto+1, d1);
__builtin_stwio(pto+2, d2);
__builtin_stwio(pto+3, d3);
pto+=4;
} Compiling this with -O3 will yields quite optimal code with four reads to different regs and for writes: movhi r7, %hiadj(1048576) # pfrom
addi r7, r7, %lo(1048576) # pfrom
movhi r6, %hiadj(1052672) # pto
addi r6, r6, %lo(1052672) # pto
movi r8, 15 # i
.L25:
ldwio r3, 0(r7) # d0, * pfrom
ldwio r4, 4(r7) # d1
ldwio r5, 8(r7) # d2
ldwio r9, 12(r7) # d3
addi r7, r7, 16 # pfrom, pfrom
stwio r3, 0(r6) # d0, * pto
stwio r4, 4(r6) # d1
stwio r5, 8(r6) # d2
stwio r9, 12(r6) # d3
addi r8, r8, -1 # i, i
cmpnei r3, r8, -1 # i
addi r6, r6, 16 # pto, pto
bne r3, zero, .L25 The transfer rates seemed too slow, so I did further investigations. It turns out that NIOS2 has a very poor SDRAM read performance because it does not perform consecutive SDRAM read accesses (but it does for write accesses). Here's a link to an oscilloscope image of a READ access: oscilloscope: sdram read (http://dziegel.free.fr/nios2/sdram_read.jpg) However, write access seems to be fine: oscilloscope: sdram write (http://dziegel.free.fr/nios2/sdram_write.jpg) Note: As you can see in the oscilloscope images, the accesses to not cross a SDRAM row (no RAS cycle between reads). Tests were performed on a NIOS 1C20 Development Kit, Project: NIOS2 full_featured. So my questions are: - What are the reasons for the slow read performance? IMHO, the read requests could be executed in the same speed than the write requests. - Will this behaviour be changed / fixed? Thank you, Dirk