Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
21 years ago

NIOS II DMIPS

I tested MIPS in Nios II full feature with Dhrystone 2.1.

as a result, cyclone ep1c20f400c7 Dev board's MIPS is about 35 DMIPS at 50MHz system clock.

bus, stratix ep1s10f780c7 Dev board's MIPS is about 58 DMIPS at 50MHz system clock.

what is reason?? http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/huh.gif why differ DMIPS at same design but difference target??

regards.

8 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The difference in DMIPS is due to the different way multiply and shift instructions are implemented.

    On a Stratix device, the DSP blocks provide hardware multipliers which are used to provide

    a fast implementation of multiply and shift.

    On a Cyclone device, there are no DSP blocks so multiplies and shifts run slower.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    When I ported our shipping firmware (all &#39;C&#39;) to the NiosI/II I knew it performed much slower than on our Coldfire board, but just we&#39;re just now understanding why. (image processing code runs 30%-50% on a 75MHz NII/S vs. 55MHz CF)

    While checking Altera&#39;s website for the NiosII update I noticed the footnote in the Core Summary Table that shifts on Cyclone are one clock cycle *per bit*. So our code that strips out individual bytes from long words using shift and mask is running 24X slower than on coldfire. (ouch)

    What would really help would be if the Programmer&#39;s Reference clearly stated the aproximate clocks per instruction for the different core/chip combos.

    If we knew what was fast, then we could tailor our algs to the processor/chip.

    Ken
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ken,

    The chapter "Nios II Core Implementation Details" in the Nios II Processor Reference Handbook contains detailed information on cycles-per-instruction and how each core gets implemented in different FPGA families. Have a look.

    http://www/literature/hb/nios2/n2cpu_nii51016.pdf (http://www/literature/hb/nios2/n2cpu_nii51016.pdf)

    Also note: As mentioned in another thread, the next release of the Nios II processor will include an option to build the multiply circuitry out of LEs in Cyclone FPGAs, which don&#39;t contain DSP blocks. This will allow Nios II on Cyclone to achieve the same multiply in very few clocks, just like on Stratix and Stratix II, at the expense of a few LEs.

    Matthew
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Look in the Nios II Processor Reference Handbook. Chapter 16 contains instruction performance for each processor variant hopefully that&#39;ll help.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    originally posted by kenland@Oct 18 2004, 09:25 AM

    while checking altera&#39;s website for the niosii update i noticed the footnote in the core summary table that shifts on cyclone are one clock cycle *per bit*. so our code that strips out individual bytes from long words using shift and mask is running 24x slower than on coldfire. (ouch)

    --- Quote End ---

    The optimizer doesn&#39;t recognize multiple-of-8-bit shift-and-mask operations as byte accesses? Oi!

    If all else fails, sounds like a plausible application for a custom instruction. You lose portability, though.

    Another thing you can do is, for example, typecast a long* to a char* if you have an array, or use a union if you have a scalar. Although those bring out byte-order issues.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks guys.

    I read through CH 16, but I still don&#39;t know how many cycles a NiosII multiply takes on my Cyclone chip.

    Anybody know conclusively if the next NiosII with the multiplier in LE&#39;s will perform exactly as well on a Cyclone as a Stratix I? Will this also give all shifts a one clock execution speed as on Stratix DSP?

    This is very important as I am in the process of having my board respun with a Stratix I replacing the Cyclone. It&#39;s a BOM budget busting move that I would prefer to avoid if the next NiosII is going to level the performance diffs.

    If they are not the same, how will the CycloneII fare vs. StratixI vs. StratixII for upcoming Nios2 cores?

    TIA,

    Ken
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Matthew mentioned that the next release of Nios II will have better multiply support for cyclone devices.

    Note that it will also have support for fast shift/rotates. This was enabled by mux enhancements to

    Quartus 4.1 (available to all users).

    As for current multiply performance on Cyclone, the current Nios II implementations provide

    no multiply hardware at all. Instead, the compiler is setup to call a function that uses other

    ALU instructions to perform the multiply. This results in pretty slow multiplies.

    If you really need good multiply performance in Nios II, you&#39;ll need to wait for our next release

    or check out the thread elsewhere on this forum about adding a multiply custom instruction.

    As for comparing Nios II multiply/shift performance on Stratix and Cyclone, the multiply performance

    using LEs on Cyclone won&#39;t match the DSP multipliers on Stratix but will be much improved over

    using software emulation for multiplies. This is because you can only multiply a few bits of the

    multiplier operand per cycle without killing the Fmax when using LEs. So, on cyclone, it will take

    multiple cycles to compute the product whereas on Stratix I/II, the dedicated DSP multipliers

    do the job in 3 cycles (pipelining provides a throughput of 1 multiply per cycle).

    As for shift/rotate, we expect to be able to match the Stratix performance on Cyclone.