Parallel adder timing issues

Question

I am beginner in FPGA design and implementation.I have 2 questions regarding adder implementations. The simulation in below message refers to gate level simulations with SDO file generated by Quartus tool.

I am working on STARTIX II FPGA with Quartus tool. My design need to work at 266 mhz clock. I am looking for fast adder with 1 clock latency(3.75 ns) . I tried Parallel_ADD with mega wizard plug in manager and implemented and simulated in Modelsim. I am seeing output after 3 clock cycles in modelsim and 6.604ns as critical time period for worst path.

I need adder for 2-input with 10 bit wide , which should work at 266 mhz with above specified technology. Can any one suggest implementation views, timing constraints need to set while implementing and etc. Your help is appreciated.

Second Question is, I am seeing the Critical timing path is around 6.604 ns (TCO) after implementation. Does it mean, this design will work 150Mhz?.when i run simulation with model sim, the output for 2-input adder is available only after 3 clocks. Test bench is modeled to work at 266 Mhz. That means latency is 3 clocks, where as timing critical path shows 6.604ns. The data should be available at 7.5 ns in simulation(According to FPGA timing summary ) , but i can see data on the output only after 9 or 10 ns. Is it correlation problem with EDA tools?. Any suggestions?.

Is there any other way, we can calculate timing critical path and its time period and calculate the max frequency to match with Simulation results?.

Regards,

Sam

altera_forum · Answer

For something like an adder, I would just start coding it in HDL and seeing what happens. If there's something you can't do in HDL, or that doesn't get synthesized the way you want, then megafunctions tend to be better. But an adder should be no problem.

Are you using the Classic Timing Analyzer or TimeQuest? Since you put 6.604ns TCO, I'm guessing you're using the Classic Timing Analyzer and looking at the delay to an output pin. When benchmarking small functions like this, make sure you're looking "inside the registers". Users often wrap these functions with registers just to make sure. Your I/O timing will be another matter that should be looked at independently, where I'm guessing in your scenario you have a fast adder, and are then looking at the timing on this last register to get out an output pin, which is not what you want to analyze for this function.

altera_forum · Answer

Hi Sam:

I don't know your entire system, but by your description, I would look in the following areas:

1) Doing the add at 266 MHz should be fine as long as both input and outputs are registered. Although you can do this adder using the megafunction wizard, This is a very basic function, that I would suggest you do in strait verilog or vhdl. For verilog this would look like the following:

module adder10bit (

input clk_i,

input reset_i,

input signed [9:0] A_i,

input signed [9:0] B_i,

output signed [10:0] Sum_o

);

reg signed [9:0] A_r;

reg signed [9:0] B_r;

reg signed [10:0] Sum_r;

wire signed [10:0] Sum_c;

assign Sum_o = Sum_r;

assign Sum_c = A_r + B_r; // Actual adder

always @(posedge clk_i)

begin

if (reset_i)

begin

A_r = 10'd0;

B_r = 10'd0;

Sum_r = 11'd0;

end

else

begin

A_r = A_i;

B_r = B_i;

Sum_r = Sum_c;

end

endmodule

Once you synthesize this block with the correct timing constraints, it should be able to do the 266MHz with no problem on Stratix II.

It will have a clock Latency of 2 clocks, but can have a new result every clock cycle.

My guess is, that your primary delay is in the input or output paths: IE IO buffers have lots of delay, so by the time you reach the adder, you've already used up most of you clock cycle. If 1 cycle latency is necessary, you can try replacing the above module with this one:

module adder10bit (

input clk_i,

input reset_i,

input signed [9:0] A_i,

input signed [9:0] B_i,

output signed [10:0] Sum_o

);

reg signed [10:0] Sum_r;

wire signed [10:0] Sum_c;

assign Sum_o = Sum_r;

assign Sum_c = A_i + B_i; // Actual adder

always @(posedge clk_i)

begin

if (reset_i)

begin

Sum_r = 11'd0;

end

else

begin

Sum_r = Sum_c;

end

endmodule

This will have the 1 cycle latency you want, but not the cycle time is limited by the input data path. If you have a lot of combinational logic here, you could be stuck.

Always make sure your clocks are defined in your SDC file. If the clock is not defined properly, you could be failing because synthesis is just not optimizing the path for that high of speed.

Hope this helps.

Pete

altera_forum · Answer

If you're doing a schematic, then megafunctions are probably the way to go, but I think parallel_add is designed more for multiple adders in parallel, then just a single adder.  If doing HDL, then when the file is open, Edit -&gt; Insert -&gt; Template has some decent examples.

altera_forum · Answer

Thanks for your answers. I tried the RTL adder also instead of mega function. Even then at 266mhz frequency, in simualtion output is after 3 clock cycles.

Can any one explain, how to find out the module working frequency in altera FPGA reports. My assumption is critical path will give us the rough estimation of clock frequency. Can any one explain how to find critical path for design.

I am using classic timing analyzer and specifying only clock freq is around 266 mhz. Is there any constraints will help to meet timing and better optimization?. Currently Tco is showing around 6.5 ns, which is 153mhz.

--Sam

altera_forum · Answer

There are three different things:

1) Tco -> This is the clock to out. Again, I'm assuming you're going to put more logic around the adder, and so this path should be ignored since your adder output won't go directly out. (Minimally, you will want to add another set of registers so they can be put into the IO cell and get better timing. You would also have to use a PLL). But the Tco isn't equatable to an Fmax as it's only part of the path. If it takes 6.5ns from a clock entering the device to data going out, that data will have to be clocked in by some other device. So you'll have board delay and setup time of the other device, making the path even slower. You'll also have clock skew across the board, which can hurt or help. But until you factor all of these things in, there is no way to equate Tco to an Fmax.

2) Internal paths. These are register to register, and since Classic Tan knows the clock feeding both, it can give you a full calculation and will report an Fmax. (There are cases where Fmax doesn't make sense, like when going between clock domains, so it's recommend not to always think in terms of Fmax, but for a single clock domain it's generally all right).

3) Finally, there is latency, which is the number of registers to get through the device. If you put down IP with multiple registers along the data path, then it would take three clock cycles. If you're doing RTL, you can look at the code and know exactly how many clock cycles it takes to get through. Of course, if you're looking at a timing simulation, that 6.5ns Tco tacked onto the end may span multiple clock cycles, even if the data "got out" a few cycles before.

Forum Discussion

Parallel adder timing issues

8 Replies

Recent Discussions

How to fix Error(23782): Failed to find an expected report

Quartus 22.1 and 23.1 Synthesis Error

Connection bit order between hierarchy

Could not link 'vsim_auto_compile.dll' error troubleshooting.

Failed to run ip-setup-simulation: