Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

Constraining external ripple clocks correctly on external pin (pad_io)

Regarding constraining of external clocks generated by registers.

I have now used several days trying to make correct constrains for my external clock generated by a DDR register driven by a PLL clock.

The external clock must have the the correct phase compared to the data signals to make a proper analyse and make me able to close timing as my design is tight,

because my databus is really bidir. But bidir is not the issue here, I just pretend the ssync_tx_data are my address bus to simplify the case.

So my design is the same as shown in Wiki Source_Synchronous_Timing.pdf case 3 page 15 published 30 aug 2011.

But the Timequest generate clock does NOT have the correct phase compared to the Timequest datasheet report nor compared to the timing shown in a gate level simulation?

The difference is quite big somethink like 1ns for fast and about 3nS for slow. To me it looks like the delay from the ddr_tx_clk input clock to the pin (pad_io is not included)?

However for the ssync_tx_data the delay all the way to the pad_io is included!

It is possible to extract the missing delay manually and add it as an offset to generated clock statement, but the delay differs for slow and fast so adding it manually do not reflect the real world!

The proper slow and fast delays must be included by TQ to be correct for both cases.

The constrains in the user guides are:

create_clock -period 6.25 ns -name fpga_clk [get_ports fpga_clk]

derive_pll_clocks

create_generated_clock -source [get_pins {inst1|altpll_component|auto_generated|pll1|clk[1]}] -name ssync_tx_clk_ext [get_ports {ssync_tx_clk}]

# External device delays

# setup requirement is 1.4 and hold is 0.4ns

set_output_delay -clock { ssync_tx_clk } -min [expr 0.4 + 0.150 - 0.05] [get_ports {ssync_tx_data[*] ]

set_output_delay -clock { ssync_tx_clk } -max [expr 1.4 + 0.150 - 0.05] [get_ports {ssync_tx_data[*] ] -add_delay

What I see is that the phase reported by TQ clocks and used together with my set_input_delay and set_output_delays differs from the actual real world timing (gate level simulation)!

I believe the generated clock named ssync_tx_clk_ext shows the timing on the input of the DDR register and NOT the pin=pad_io ssync_tx_clk!

For the data signals outputs timing in TQ seems to fit well with datasheet report tco and gate level timing, but not for the clock.

We need to have a generated clock that includes the ddr_tx_clk input clock to pad_io delay which differs a lot for slow and fast silicium to do a proper analyze.

Is there a way to generate the clock at the actual pin (pad_io)?

Or have I misunderstood somethink here?

Any help is appreciated

:confused:

15 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I'm trying to correlate TimeQuest's timing model to external values and make sure they make sense. In TimeQuest, the set_output_delay constraint says there is an external register being driven by the output port, that register is clocked by the -clock option, and the delay to that register is a -max and -min value. That's what it does timing analysis and I think it's easiest to understand when you think of it that way.

    When TimeQuest does setup timing analysis, it needs to make sure the data gets to that register before the latch clock. For hold, it needs the data to get there after the latch clock.

    So that's how it's being analyzed, but as you point out, external devices don't usually say what there delays are inside themselves. They might say they have a Tsu of 3ns. One way a device has a Tsu of 3ns is by saying internally its data path is 3ns longer than its clock path, and hence the data must be available at the ports 3ns before the clock is available. That may not be what's happening, i.e. it may be the paths are equal but there is a PLL that phase-shifts the clock forward 3ns. I don't know, but for all intents and purposes I don't care. When I increase my -max value by 3ns, I am saying externally the data path is 3ns longer than the clock path, based on TimeQuest's model. So I'm making the datasheet match the model.

    You're right that the -min value is predefined by Synopsys. They could have had an option called set_output_delay -th, and the user could put it in directly. But then when absorbing the board delays, they would need to do max_clk_dly - min_data_dly, i.e. they would have to invert what they did for the max value. That's why I think Synopsyis/TQ's way is more consistent. Anything on the data path is always added and anything on the latching clock path is subtracted. For the -max value you use the larger value for the data path and smaller value for the clock path. For the -min value you do the opposite. But Synopsys could have done it a different way.

    So you're right that I don't know what's going on inside the device, and the Tsu is only at the I/O ports of the device, but the description of what's going on inside the device is not wrong, as you get the same analysis if you think of it that way. I'm just trying to help visualize it rather than plug in equations, which is how I think a lot of people get into trouble.

    And there is another way to think of it that you might like. When the -max value gets larger, that is telling the FPGA it needs to get its data out more quickly to meet the setup relationship. So when the -max value becomes 3ns, you're telling the FPGA to have its data available 3ns before the clock. Thinking of it that way is not implying what's going on inside the FPGA(but when you do timing analysis, you will see the data path has become 3ns longer).

    I think I understand your point, but I also don't think I said anything wrong either.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Rysc

    Thanks for clearing up the hold time.

    After studying your case 3 example in details I finally realized I had misunderstod the way TQ handles ripple clocks completely.

    Instead of generating a clock at the flipflop output /in this case the pad IOO) it generates the clocks on the clock input of the FF. Then it takes the delay from this clock to the actual pad IO and move it into the data delay as a negative delay! Very simple when you get it, but very different from what I expected.

    The naming of the generated clocks as *_ext in the docs does not make it easier to see,

    I would prefer *_int, but never mind it is just names.

    I think it would be a help for many people if you added a chapter in your source synchronous document that spell this out (Cut it out in Carbon as we say in Danisch).

    So why is it made this way by Synopsys, I can only guess ?

    The backside is that when I look at waveform signals in TQ they do not represent the actual Clock signal that drives the external chip.

    However I see that it has one advance, if you include delays in clocks nets that vary much between FAST and SLOW HW, you risk the clock edge is before another clock edge in FAST while after in SLOW. This will gives difficulties to decide which clock edge is the right one to use in the analyze. By using fewer signals as clocks and stick to global nets and phase aligned PLL outputs this problem is minimized and the analyze is simpler when having fewer signals as clocks.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    To be honest, I didn't follow the first part. The latching clock for case 3 should start at the clock coming into the FPGA, through the PLL, global clock tree, any ripple clocks on the path(which would have a generated clock assignment) and finally to the output port driving the clock out(which also has a generated clock on it). I call this final generated_clock *_ext because it is whast is used to clock the external register being driven by our output data ports.

    Anyway, it sounds like you have it and our comfortable, I'm just trying to understand the confusion better to see if there's a better way to explain it(or cut it out in carbon).
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Rysc

    I think it is important to explain what have confuse me and properly others when starting to use TQ.

    So I will try again in another way, please ask or correct me if you think I do not understand it correctly now.

    From case 3:

    create_generated_clock -source [get_pins{inst1|altpll_component|auto_generated|pll1|clk[1]}] -name ssync_tx_clk_ext [get_ports {ssync_tx_clk}]

    When I viewed above clock in TQ waveform I expected it to view the signal as it will appear on the HW pin ssync_tx_clk: so it should view delayed phase compared to the sys_clk_90shift.

    But TQ does not do that. Instead it view the clock signal named ssync_tx_ext as it is on the outclock input of ddr_tx_clk which is in the same phase as sys_clk_90shift, actually the same phase.

    So I did not understand how the analyze could be correct and tried to add the delay to the padio manually because I though TQ did not support ripple clocks properly. Finally I realize that TQ moves this clock delay into the data delay as a negative data when displayed in TQ waveform.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    So yes, the generated clock will look identical to the sys_clk_90shift, because that's what it is based on and there is no -phase or anything. Remember that the clock launch and latch edges are "ideal", i.e. how the clocks are described in the .sdc before any place-and-route is done. If the time to get off chip was 2ns or 200ns, it wouldn't have any affect on where the launch and latch edges are.

    My concerns is the last statement that the clock delay is moved into the data delay. The clock delay is still part of the latch clock path. So in the waveform view, the second dotted line labeled "clock delay" is what this time is to get off chip, and it starts from the latch clock time. I prefer looking at the Data Path tab, which has more information but is harder to visualize. If you look at the bottom Data Required Path window, it starts with thte Latch Edge, which is the same as the sys_clk_90shift clock. If you follow the location column, you'll see that clock is described as coming into the FPGA at input port fpga_clk, then goes through the PLL, global, DDR output, and finally out ssync_tx_clk. The User Guide briefly discusses this, but generated clocks should always start way back at the base clock, so it can account for any delays getting to the generated clock. Anyway, take a look at that and see if it makes sense. In the end, the Data Path tab has the Data Arrival Path, which is how long your data takes to get out, and your Data Required Path, which is how long the clock takes to get out. It also has the output delay subtracted from it. Hopefully that makes sense.