Matthew mentioned that the next release of Nios II will have better multiply support for cyclone devices.
Note that it will also have support for fast shift/rotates. This was enabled by mux enhancements to
Quartus 4.1 (available to all users).
As for current multiply performance on Cyclone, the current Nios II implementations provide
no multiply hardware at all. Instead, the compiler is setup to call a function that uses other
ALU instructions to perform the multiply. This results in pretty slow multiplies.
If you really need good multiply performance in Nios II, you'll need to wait for our next release
or check out the thread elsewhere on this forum about adding a multiply custom instruction.
As for comparing Nios II multiply/shift performance on Stratix and Cyclone, the multiply performance
using LEs on Cyclone won't match the DSP multipliers on Stratix but will be much improved over
using software emulation for multiplies. This is because you can only multiply a few bits of the
multiplier operand per cycle without killing the Fmax when using LEs. So, on cyclone, it will take
multiple cycles to compute the product whereas on Stratix I/II, the dedicated DSP multipliers
do the job in 3 cycles (pipelining provides a throughput of 1 multiply per cycle).
As for shift/rotate, we expect to be able to match the Stratix performance on Cyclone.