Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
21 years ago

hw multiplier for Cyclone

With NIOS-1 i could select three implementations for multiplications (sw, mul_step, mul). In Nios-2 this feature is complety missing. Is there any way to tell the SOPC-Builder for NIOS-2 to integrate a hw-based multiplier with only a few cycles latency(1-3)?

I can do it with a custom instruction but how can I tell the compiler to use this function for multiplications - or better how can i instruct the sopc-builder to take my multiply custom instruction for the mul instruction?

I know that Cyclone2 (and Stratix) will have such predifined blocks - but right at the moment I have to do it with a Cyclone and I'm willing to spend some LE's for this feature.

Another question: what happend to the predefined "divide" custom instruction available in NIOS1? There is a subdirectory for it in the "components" directory but in the SOPC-Builder it's not visible under custom instructions.

Thanks for any help

Chris

18 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    What he is trying to say is if you put this big massive multiplier out there things will slow down (multipliers implmented in LEs are pretty big).

    Currently I've been finding that even thought the Stratix has multipliers implmented in DSP blocks, it is the main factor in your fmax with NIOS II.

    Without the DSP blocks doing a hardware multiplier you will have to make a comprimise somewhere.

    That's why I was suggesting one that was bitwise (around 35 clock cycles). It would be as slow as the division but it would be small and pretty fast.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Look for a hardware multiplier option for Cyclone and Cyclone II devices as part of the next full release of Nios II. This will release near the end of the year, and will be both shipped as an update and available for download.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    <div class='quotetop'>QUOTE </div>

    --- Quote Start ---

    Look for a hardware multiplier option for Cyclone and Cyclone II devices as part of the next full release of Nios II. This will release near the end of the year, and will be both shipped as an update and available for download.[/b]

    --- Quote End ---

    Sounds good, althought I need the multiplier immediately.

    We made our own CI-multiplier working perfect. But if there is an access to data-structures the processors still calls the "__mulsi3" routine in the lib2-mul.c file - I assume this are pointer operations. The other file is the "alt_exception.S" where some multiply and division routines can be found (in assembler).

    My idea is to add my custom instruction to these files to get a fast multiplications - can someone point me to the information when the routines in this files are called and whats the difference between them?

    Thanks a lot

    Chris
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Chris,

    __mulsi3 is the standard GCC single-width integer multiply routine. It is called when the compiler knows that the Nios II processor lacks integer multiply instructions. It&#39;s a great place to put your multiply custom instruction. BTW, if you do any 64-bit math, you might see calls to __muldi3.

    alt_exceptions.S contains the multiply and divide emulation routines, among other things. You can ignore these as long as the software of your system has been generated for Cyclone. The emulation routines are called only when a Nios II processor that lacks a hardware multiply attempts to execute a multiply instruction.

    Kerry
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    If you need something now, then you will have to sacifice something.

    Binary Multiplier ----> small hardware size, "long" latency (about 35 cycles), need to map hardware (can&#39;t use "*" directly).

    Parallel Multipier ----> large hardware size, short latency (probably 1-3 depending on how it&#39;s implemented), need to map hardware (can&#39;t use "*" directly), and as James mentioned can impact Fmax greatly.

    Software Multiplier (like Kerry suggested) -----> no extra hardware, long latency, no need to map hardware for multiplication (you use "*" in you&#39;re code)

    If any of those sound good to you let us know and we can provide more info on how to go about using any of these options.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Kerry,

    <div class='quotetop'>QUOTE </div>

    --- Quote Start ---

    __mulsi3 is the standard GCC single-width integer multiply routine. It is called when the compiler knows that the Nios II processor lacks integer multiply instructions. It&#39;s a great place to put your multiply custom instruction.[/b]

    --- Quote End ---

    We got now the problem that we don&#39;t know how to tell the compiler to recompile the "lib2-mul.c" where we call our custom multiply instruction. It has something to do with the libgcc - are there some compiler flags we can set?

    Perhaps a trivial question - but we have more knowledge on the HW-stuff than on compilers...
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Finallly we have our HW-Based multiplier.

    The trick to tell the compiler to use our custom multiplier was to put somewhere in the project a "__mulsi3()" routine. This one seems to override the builtin library function. This method only works if we are using alt_main - for us OK because we don&#39;t use the HAL. Perhaps someone with deeper knowledge of the compiler can clarify this.

    The HW is a peripheral (not a custom instruction) implenting an asynchronous multiplier. To get the result it takes 3 clock cycles@60MHz on a Cyclone with speed grade 8 - the multiplyer-unit is defined as multi-cycle in Quartus - hence no impact on Fmax. Nice side-effect of this implementation is that we can use a 64bit result with the same hardware and time.

    We know that normally everything should be designed strictly syncronous - but sometimes it&#39;s necessary to take other solutions.

    BTW: with the same method we implemented a 32/32 divider with less than 12cycles. And we get result and remainder in the same time.

    Chris
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ya there&#39;s no problem with implementing the multiplier asynchronously. I&#39;m not too familar with the DSP blocks in cyclone, but in stratix you get input/output registers within the block so they don&#39;t require extra LE&#39;s in case you want to make the block synchronized.

    I&#39;m assuming the divider you used is the megawizard one as well with the divisor and dividend in, and quotient and remainder out. I was using that in a design in the mean time before I made my own synchronized divider (wanted something around 1/5th the size in LEs). Watch out for the answers coming out of that block. The have two modes....... I&#39;d tell you what they are but still haven&#39;t installed quartus because I&#39;ve been busy around the house. But basically one mode gets you a proper answer (remainder always positive), and the other mode gives you sign dependent remainders (but can give you magnitudes that you don&#39;t normally expect. If you were unaward of this I would throw it into a simulation by itself and try different values into it to make sure you don&#39;t run into bugs later down the road (because that divider wouldn&#39;t be the first place to look I&#39;d imagine). Or check the documentation for lpm_divide or lpm_divider (whatever it&#39;s called) in quartus and they show near the bottom what kind of results to expect from the different modes (keep and eye on the remainder or you might miss the difference between the two modes)

    Cheers.