Altera_Forum
Honored Contributor
13 years agoNios instruction timings
I thought I understood the Nios instruction timings, and exactly when pipeline stalls occur. However I've found some discrepencies between calculated and measured execution times.
This is a /f processor without the dynamic branch predictor and with all code and (almost) all data cycles going to tightly coupled memory. Apart from delays due to contention on the few Avalon data transfers, the execution time ought to be determinable. (I've measured the non-conteded avalon cycles.) One place I've found an unexpected stall appears to be in the difference between: ldhu rx, 0(ry)
add ra, rb, rc
stb rd, 0(ra)
bne re, zero, labeland add ra, rb, rc
stb rd, 0(ra)
ldhu rx, 0(ry)
bne re, zero, label although there are no 'late result' stalls, if the execution brances to label (forwards so predicted not taken) then the second version has an additional stall - on top of the 4 cycles lost because the branch is mispredicted. I'm comparing the execution time of the above with the code path that takes a branch just before, and merges just after - the difference between the two should be 1 clock (one is 7, the other 8). Anyone any thoughts on this? Is there a lurking extra stall cycle when a memory load (etc) preceeds a mispredicted branch? I'm also not sure I have enough mispredicted branches in my slow code path to account for the overall additional delays. I might try to get a signaltrap trace of the code addresses.