When I ported our shipping firmware (all 'C') to the NiosI/II I knew it performed much slower than on our Coldfire board, but just we're just now understanding why. (image processing code runs 30%-50% on a 75MHz NII/S vs. 55MHz CF)
While checking Altera's website for the NiosII update I noticed the footnote in the Core Summary Table that shifts on Cyclone are one clock cycle *per bit*. So our code that strips out individual bytes from long words using shift and mask is running 24X slower than on coldfire. (ouch)
What would really help would be if the Programmer's Reference clearly stated the aproximate clocks per instruction for the different core/chip combos.
If we knew what was fast, then we could tailor our algs to the processor/chip.
Ken