Forum Discussion
12 Replies
- Altera_Forum
Honored Contributor
I assume by IP core you mean "lpm_mult"?
Using lpm_mult you have more control over the multiplier unit to be target. Typing '*' into your code you are at the mercy of what the synthesis engine picks for you. So for portable coding '*' is a better approach but when you are tuning your design for performance or area you might need to resort to using lpm_mult. I recommend using '*' and when that doesn't give you the results you are looking for then try replacing it with an instantiation of lpm_mult. - Altera_Forum
Honored Contributor
thanks ,for your suggestion.
you mean that i can use the * for the primary design ,and if the synthesis result can't fit the target ,then i can usethe LPm_MULT instead for the second design to improve the function? - Altera_Forum
Honored Contributor
An example of switching to lpm_mult would be if you determine features of the DSP/embedded multiplier block are not being utilized when the multiplier is inferred when you were counting on it. Sometimes it is not possible for the synthesis engine to map all the features of the hardware multipliers so using lpm_mult gives you the ability to do this.
In the Quartus II handbook there is a chapter called something like "HDL coding guidelines". It probably does a better job explaining this under the section about multiplication. I try to avoid using the LPMs whenever possible since I often create IP for different FPGA families and the hard block characteristics sometimes differ. So options of lpm_mult may vary between families which makes your implementation less portable as a result (you may not care about portability though). - Altera_Forum
Honored Contributor
This post is timely, because I have been having an issue related to it. For larger multipliers, lpm_mult creates logic that is much faster. In my case of a signed 32x32 multiply, lpm_mult is double the speed of using "*" in Verilog. For a reference, here is my code:
I get 90 MHz fmax with the SystemVerilog version, and 179 MHz with the lpm_mult version. I would rather use "*" for code portability, but the 50% speed cut is unbearable in my application.module mult_test( input CLK, input signed IN_A, input signed IN_B, output signed OUT_C ); //Verilog version reg signed IN_A_d1; reg signed IN_B_d1; reg signed mult_result; assign mult_result = IN_A_d1 * IN_B_d1; always @(posedge CLK) begin IN_A_d1 <= IN_A; IN_B_d1 <= IN_B; OUT_C <= mult_result; end /* //Altera LPM Megafunction version //Created as signed 32x32 -> 64-bit multiply with 2 cycles of latency wire signed mult_result; assign OUT_C = mult_result; megafunction_mult megafunction_mult_inst ( .clock (CLK), .dataa (IN_A), .datab (IN_B), .result (mult_result) );*/ endmodule - Altera_Forum
Honored Contributor
I think it is to do with pipeline. You cannot pipeline internally with the inferred case as you did with lpm assuming a dedicated mult was generated in either case.
- Altera_Forum
Honored Contributor
I should mention that I copied the code format from "Example 10–2. Verilog HDL Signed Multiplier with Input and Output Registers (Pipelining = 2)" in the Quartus II handbook.
- Altera_Forum
Honored Contributor
To improve speed further, you better register io of mult block and also insert registers between the block and fabric.
- Altera_Forum
Honored Contributor
code portability of slow problematic design is never a good idea and defeats its purpose... common sense ??
- Altera_Forum
Honored Contributor
Try writing it the same way as shown in the multiplier template under the Edit menu. You can find it here under the templates: Verilog HDL --> Full Designs --> Arithmetic --> Multipliers --> Signed Multiply with Input and Output Registers.
- Altera_Forum
Honored Contributor
--- Quote Start --- Try writing it the same way as shown in the multiplier template under the Edit menu. You can find it here under the templates: Verilog HDL --> Full Designs --> Arithmetic --> Multipliers --> Signed Multiply with Input and Output Registers. --- Quote End --- I just tried that, and it gave me the same result as my own hand-written Verilog. So, the current fmax summary: lpm_mult with two cycle latency = 180 MHz Verilog * operation with input and output registers = 90 MHz Quartus II Verilog signed multiply with I/O registers template = 90 MHz My .sdc file is setup to try for 200 MHz in every case.