Forum Discussion
Altera_Forum
Honored Contributor
17 years agoThanks FvM and nobody1234. But I am still in confusion .
In my question, there is two important issues: 1) Whether one multiplication or division operation can be done in one clock? In my computation there are tens of multiplication or division operation, so my pipeline idea is cutting the long formula into single operations, i.e. only pipeline each step, rather than do pipeline in one opertation. So I'd like to use parallel multiplier or dividers and set LPM_MULT or LPM_DIVIDE pipeline stage 0. For example, do f=(x[7:0]*y[3:0])/z[4:0]; the simple verilog code is: module test(...); //io declaration... always@(clk) begin pipeline0 <= xy; pipeline1 <= xyz; f <= pipeline1; end lpm_mult lpm_mult_component ( .dataa (x[7:0]), .datab (y[3:0]), .clock (clk), .result (xy), .aclr (1'b0), .clken (1'b1), .sum (1'b0)); defparam lpm_mult_component.lpm_hint = "MAXIMIZE_SPEED=5", lpm_mult_component.lpm_pipeline = 0, lpm_mult_component.lpm_representation = "UNSIGNED", lpm_mult_component.lpm_type = "LPM_MULT", lpm_mult_component.lpm_widtha = 8, lpm_mult_component.lpm_widthb = 4, lpm_mult_component.lpm_widthp = 12; lpm_divide divider1 ( .denom (z[4:0]), .clock (clk), .numer (pipeline0[11:0]), .quotient (xyz), .remain (), .aclr (1'b0), .clken (1'b1)); defparam divider1.lpm_drepresentation = "UNSIGNED", divider1.lpm_hint = "LPM_REMAINDERPOSITIVE=TRUE", divider1.lpm_nrepresentation = "UNSIGNED", divider1.lpm_pipeline = 0, divider1.lpm_type = "LPM_DIVIDE", divider1.lpm_widthd = 5, divider1.lpm_widthn = 12; endmodule Because of the difference of the operation width and FPGA complexity, the above two pipe step will lead to different caculation latency. Some operation may get stable output in one clock, and some will be stable in two clocks. Assume I will do tens of multiplication or division operation, if there is one step latency exceeding one clock, then my pipeline idea is useless. How to overcome this issue? 2) input alignment in pipeline. In each operation such as x*y or x/y, the x,y must be arrived at the same clock edge. Assume i will do 20-level pipeline operation, whether I must delay some variables 20 times in the last pipeline. Is there good way to delay one variable with 20 clocks like: a_1 <= a; a_2 <= a_1; ... a_20 <= a_19; Otherwise i must define a lot of intermediate registers to pipeline it. As nobody1234's idea, i think it's good for a single operation. While it's not efficient for tens-level multiplier or divider.It cost FPGA resource too much.