Forum Discussion
Altera_Forum
Honored Contributor
15 years agoThe IPC is documented in the processor reference handbook, generally it is 1.
However: - The result of a non-ALU instruction (ld, mul, shift) has to go via the register file, resulting in a 2 clock delay before the value can be used. I presume there is some 'result forwarding logic' within the ALU. - Any Avalon MM transfers (including writes) are done synchronously and take at least 3 clocks (I haven't seen any writes taking 2 clocks). - I think the SDRAM interface buffers at least 2 write requests - so the first 2 randon writes to SDRAM (and maybe other memory) complete in 3 cycles. - As documented, branches are 1 clock predicted not taken, 2 clocks predicted taken, 4 clocks mispredicted. For relatively small code blocks it is possible to adjust the C source to avoid almost all the pipeline stalls (usually by loading values into local variables and using 'asm volatile ("":::"memory")' to stop gcc reordering instructions). The __builtin_expect() can be used to set the static branch prediction for conditionals (sometimes it is necessary to put an empty 'asm' statement in an otherwise empty 'else' branch). If you are trying to squeeze out every last drop of performance, then the dynamic branch prediction will only slow things down - getting the source right is better unless a single instruction needs to predict in different directions at different times. Your Altera rep should be able to tell you how to disable the dynamic branch prediction.