Forum Discussion
Altera_Forum
Honored Contributor
10 years agohttps://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/hb/nios2/n2cpu_nii5v1.pdf
See "Instruction Performance" on page 5-11, 5-19, or 5-21 depending on what core you're using. Your question was asking about instruction performance, but if you really just care about higher-level C function/loop execution times, AN391 is a good read: https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/an/an391.pdf Especially the Performance Counter IP block is very useful. Many things can be done in a single cycle. But getting the compiler to emit the best code, and constructing optimized hardware, can all become a small research project by themselves. For example, if you just rewrote your delay() in a form that GCC likes just a little bit better, it looks like it would average (3) cycles per loop iteration on an "F" core.
void delay(void)
{
register int i =0;
const register int limit = 100000;
for(i=0; i < limit; i++) {
}
}
And the assembly (gcc -S foo.c): (.L3 is the loop iterator increment, followed by the .L2 "blt" compare against the 100000)
delay:
addi sp, sp, -12
stw fp, 8(sp)
stw r17, 4(sp)
stw r16, 0(sp)
addi fp, sp, 8
mov r17, zero
movhi r16, 2
addi r16, r16, -31072
mov r17, zero
br .L2
.L3:
addi r17, r17, 1
.L2:
blt r17, r16, .L3
addi sp, fp, -8
ldw fp, 8(sp)
ldw r17, 4(sp)
ldw r16, 0(sp)
addi sp, sp, 12
ret