hi ,everyone ,i have a question about the mulitplier. i am a newer in FPGA ,now , i wonder the different between the mulitplier constusted by * in HDL and the one generated by Ip core ,what is the most important different between these two kinds of multipliers? the speed or any others? thaks

I assume by IP core you mean "lpm_mult"? Using lpm_mult you have more control over the multiplier unit to be target. Typing '*' into your code you are at the mercy of what the synthesis engine picks for you. So for portable coding '*' is a better approach but when you are tuning your design for performance or area you might need to resort to using lpm_mult. I recommend using '*' and when that doesn't give you the results you are looking for then try replacing it with an instantiation of lpm_mult.

thanks ,for your suggestion. you mean that i can use the * for the primary design ,and if the synthesis result can't fit the target ,then i can usethe LPm_MULT instead for the second design to improve the function?

An example of switching to lpm_mult would be if you determine features of the DSP/embedded multiplier block are not being utilized when the multiplier is inferred when you were counting on it. Sometimes it is not possible for the synthesis engine to map all the features of the hardware multipliers so using lpm_mult gives you the ability to do this. In the Quartus II handbook there is a chapter called something like "HDL coding guidelines". It probably does a better job explaining this under the section about multiplication. I try to avoid using the LPMs whenever possible since I often create IP for different FPGA families and the hard block characteristics sometimes differ. So options of lpm_mult may vary between families which makes your implementation less portable as a result (you may not care about portability though).

This post is timely, because I have been having an issue related to it. For larger multipliers, lpm_mult creates logic that is much faster. In my case of a signed 32x32 multiply, lpm_mult is double the speed of using "*" in Verilog. For a reference, here is my code: module mult_test( input CLK, input signed IN_A, input signed IN_B, output signed OUT_C ); //Verilog version reg signed IN_A_d1; reg signed IN_B_d1; reg signed mult_result; assign mult_result = IN_A_d1 * IN_B_d1; always @(posedge CLK) begin IN_A_d1 <= IN_A; IN_B_d1 <= IN_B; OUT_C <= mult_result; end /* //Altera LPM Megafunction version //Created as signed 32x32 -> 64-bit multiply with 2 cycles of latency wire signed mult_result; assign OUT_C = mult_result; megafunction_mult megafunction_mult_inst ( .clock (CLK), .dataa (IN_A), .datab (IN_B), .result (mult_result) );*/ endmodule I get 90 MHz fmax with the SystemVerilog version, and 179 MHz with the lpm_mult version. I would rather use "*" for code portability, but the 50% speed cut is unbearable in my application.

I think it is to do with pipeline. You cannot pipeline internally with the inferred case as you did with lpm assuming a dedicated mult was generated in either case.

what is the different between * and IP core multiplier

12 Replies

Altera_Forum
Honored Contributor
14 years ago
By any chance is the module you are testing this with at the top level? If so I suspect your input and output registers are being packed into the I/O. Either assign those inputs and outputs to virtual pins using the assignment editor or just shove a bunch of pipeline stages in front and after the multiplication in your HDL file. This will make sure you'll iscolate the multiplier from the I/O. So in other words do this:

Register --> register --> register --> register --> multiply --> register --> register --> register --> register

If this causes your timing problems to go away then don't worry, you won't need that kind of pipelining once you feed the multiplication with on-chip inputs and outputs (and if you do that means the surrounding logic could use some pipelining).
Altera_Forum
Honored Contributor
14 years ago
--- Quote Start ---
By any chance is the module you are testing this with at the top level? If so I suspect your input and output registers are being packed into the I/O. Either assign those inputs and outputs to virtual pins using the assignment editor or just shove a bunch of pipeline stages in front and after the multiplication in your HDL file. This will make sure you'll iscolate the multiplier from the I/O. So in other words do this:

Register --> register --> register --> register --> multiply --> register --> register --> register --> register

If this causes your timing problems to go away then don't worry, you won't need that kind of pipelining once you feed the multiplication with on-chip inputs and outputs (and if you do that means the surrounding logic could use some pipelining).
--- Quote End ---

The full history is that I had a design with a signed 32x32 signed multiply working fine. That multiply was not at the top level.

Then, Quartus updated to 11.0 and that existing design suddenly failed timing by a huge margin. The failing paths were mostly through the 32x32 multiply. Fmax dropped from 100 MHz to 60 MHz. I tried rebuilding the design in 10.1 and it went back to 100 MHz. Long story short: Altera said they found the problem and it's a bug in 11.0.

They didn't give a workaround, though, so I've been experimenting in the hopes that I can get code that will always meet timing while still being portable. Sadly, no amount of extra pipelining has brought the design back to the Quartus 10.1 fmax in Quartus 11.0.

The fmax numbers from my last post were all based around top-level modules, though. I will try adding the extra registers like you mentioned to see if the numbers change.

Update: Adding three additional series registers to both inputs and three additional series registers to the output increased fmax for both "*" and the Quartus II template code, but only by about 4 MHz.

So now it's up to ~180 MHz for lpm_mult vs ~94 MHz for code.

Forum Discussion

what is the different between * and IP core multiplier

12 Replies

Recent Discussions

Bidirectional pin USB_RX with a pseudo-differential I/O standard must use the OEIN port of the node

Obsolescence issues

Avalon-ST configuration with Agilex 3 fails

Cyclone IV E – PLL Power Track Width Recommendation Clarification

JTAG Chain Broken on Agilex 7-I Dev Kit