Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
17 years ago

set_output_delay -min / -max does not have intended effect

Hi,

Per earlier recommendations i have switched to TimeQuest in order to correctly constrain output signals in my design. The FPGA is a Cyclone II and i'm using Quartus II 7.1.

I'm trying to achieve correct timing for a source synchronous output bus that drives an external FIFO. I have a 48 MHz clock, a 16-bit data bus and a 'write'-signal. The external FIFO requires the following setup and hold timings:

data tsu_needed = 4ns

data th_needed = 5ns

write tsu_needed = 13ns

write th_needed = 5ns

Given the source synchronous interface, i'm using the following SDC Settings (the complete SDC file is attached):

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 13 [get_ports {N_SLWR}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -4 [get_ports {N_SLWR}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[15]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[15]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[14]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[14]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[13]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[13]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[12]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[12]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[11]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[11]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[10]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[10]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[9]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[9]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[8]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[8]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[7]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[7]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[6]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[6]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[5]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[5]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[4]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[4]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[3]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[3]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[2]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[2]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[1]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[1]}]

set_output_delay -add_delay -max -clock [get_clocks {CLK_OUT_48}] 4 [get_ports {DATA_OUT[0]}]

set_output_delay -add_delay -min -clock [get_clocks {CLK_OUT_48}] -5 [get_ports {DATA_OUT[0]}]

Unfortunately, the timings as measured by my Logic Analyzer attached to LAI pins on the FPGA shows that the following timings result:

data th_actual = 1.3ns

write th_actual = 2ns

Changing the -min time to something more negative does not affect the hold time a bit.

Any ideas where to start debugging this?

Thanks,

/John.

18 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The clocks always relate back to the original input. If it doesn't the different delays can cancel out. (Let's say your clock and data were created from two different PLLs. If it ignored the delays up to the PLL, then you would have incorrect timing analysis since they would be slightly different. This is best practice, and in the case where that delay is the same for the data and clock, they both just cancel out and there is no problem. Note that it doesn't say your clock to out is 17ns, that's the information I pulled out of it(and it is adding 13ns of routing delay, so it's somewhat accurate). What TQ is telling you that when you launch data from time 0ns, it gets out there after the clock launched at time 0ns, including your -min delay requirements, which is what you want.

    I ignored your SLWR signal's requirement, or at least wasn't thinking about it. You've got a 20ns clock period, and this signal chews up 13ns for setup and 5ns for hold, allowing the FPGA to try to align the clock and data with a max difference of 2ns. I hate to say it, but that's probably not going to happen. You really have too slow of a memory. You'll probably either need a faster run, or make this particular path a multicycle(i.e. when you send data on that line, you don't expect it it reach the destination for 2 clocks). I don't know the behavior of this signal to know what it's affect will be.

    And probably the reason doubling your set_output_delay didn't have as great affect is that it's already doing everything it can. Like I said, it's adding 13ns of routing delay already, which easily meets timing in the slow corner, but probably doesn't meet by much in the fast corner. This is why "adding delay to meet timing" can only go so fast, because the slow and fast corners can differ by a lot. When the clock and data paths are aligned, then the paths will still vary a lot, but they vary together and everything works out nicely.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks FvM and Rysc. Your feedback here is much appreciated. I'm still getting up to speed on FPGA best practices and your comments are invaluable.

    What still confuses me is that if i don't constrain my I/O at all i'm approximately getting the same I/O timing as when i specify the SDC that was posted. It is the hold timing that is not met (around 2-3 ns regardless if the I/O is constrained or not). Since the SLWR signal requires 13 ns setup time and 4ns hold time , this is what i put into the SDC. The same goes for the DATA_OUT bus but that has much less stringent tsu.

    So, basically, i just want to delay the SLWR and DATA_OUT signals a few ns (2 to 5 would do) so that the th is met. That however doesn't guarantee tsu unless it is also constrained. This brings me back to the original problem with how to set up the SDC.

    I'm hoping you see the problem and can give me some hints how to properly constrain this I/O. It seems like playing with the clock signal is not the correct method when the data timing is only a couple of ns off.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    When you say I/O timing, do you mean through the LAI interface or what TimeQuest reports. The path I looked at added a 13ns route to your output path, which is something I have never seen done before, and I am certain is due to your timing constraints. So I still don't see the problem as it looks like what is occuring is correct(and impressive, as adding delay used to be very difficult for any FPGA fitter just a few years ago).

    You say adding 2-5ns would do, but you're saying it has to add 2-5ns across all timing models, which is difficult to do. My feeling is that you're trying to do something difficut(interface to a RAM that just isn't made to run at 48MHz), but it might be possible. What are the SLWR signals slack for setup slow model and hold fast model? You could attach those too, but it's a very tight window you're shooting for.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    When you say I/O timing, do you mean through the LAI interface or what TimeQuest reports.

    --- Quote End ---

    I'm always measuring my timings with the LAI interface as well as scope to see the actual output timings. This is mostly because i am not yet comfortable trusting reports from Quartus but also because i'm not sure where to look for these numbers due to the c onfusing clock reference used. Measuring timing via LAI has been very accurate over the 2+ years i have used this method. Altera has probably made an effort to keep the LAI pin delays as small as possible to allow this.

    --- Quote Start ---

    The path I looked at added a 13ns route to your output path, which is something I have never seen done before, and I am certain is due to your timing constraints. So I still don't see the problem as it looks like what is occuring is correct(and impressive, as adding delay used to be very difficult for any FPGA fitter just a few years ago).

    --- Quote End ---

    I however wonder why 13ns is reported in the first place since the non-constrained timing (th) was only some 2-3 ns off. It seems to me a delay of 3ns should be sufficient.

    --- Quote Start ---

    You say adding 2-5ns would do, but you're saying it has to add 2-5ns across all timing models, which is difficult to do. My feeling is that you're trying to do something difficut(interface to a RAM that just isn't made to run at 48MHz), but it might be possible. What are the SLWR signals slack for setup slow model and hold fast model? You could attach those too, but it's a very tight window you're shooting for.

    --- Quote End ---

    I don't know the actual slack right now (i'm at my day job). How do i generate text-formatted reports for this? The receiving FIFO is an FX2 USB controller that is rated up to 48 MHz. It requires the timings i stated earlier per the datasheet. I have the option of sourcing the clock from the FX2 CPU, in which case the large 13ns setup requirement on SLWR will go down to around 4 ns. This unfortunately requires more changes to the design.

    It looks more and more like the best route is to simply delay the 48 MHz clock so that the slack is spread evenly over tsu and th. I then should lock in the timing tsu/th with the SDC to constrain the design.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    As Rysc metioned, the timing windows for SLWR is small. (Without roundig up the FX2 specifications as you did, I get 4.4 ns, that sounds a bit better). But I would try to use a structure, that has a low delay skew by design.

    I understand, that the 48 MHz CLK output is a dedicated PLL output. If SLWR and data is sourced from an output register clocked by the same clock, you get a precise timing, but the hold time (related to PLL output) is most likely 1 or 1.5 ns too short, if CLK and SLWR use the same drive strength. Making the CLK output fast (maximum drive strength) and SLWR slow (lower drive strength), is hopefully sufficient to achieve the required timing. The about 0.5 ns output delay could be used additionally.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ok, i have now found out that the LAI output DOES add a fair amount of delay compared to the actual output pins. This means Quartus, the Classic Timing Analyzer, TimeQuest as well as you has been right all along. Thanks for pointing me in this direction - i had assumed that the LAI had much less latency...

    When measuring directly on the output pins i am meeting the I/O constraints with 1 ns to spare (th for the SLWR signal). The other worst-case slack ranges from 2.5 to 3.3ns. I have enabled "multi-corner analysis " in the settings dialog, what other settings must i do, if any) to ensure that my timings are correct for slow/fast devices?

    Thanks,

    /John.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    If you make timing with multi-corner analysis(i.e. you should see 2 or 3 different timing analysis runs, or you can do it manually by running create_timing_netlist with different options or doing it once and changing with set_operating conditions). You may want to make your requirements a little bit worse, since your board will add a little more skew beween the data and clock, but if you have 1ns to spare, you should be fine.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks Rysc. I changed the operating conditions to 'fast' and i now have one failed path:

    Slack: -0.159

    From node: ...dffs[0]

    To node: N_SLWR

    Launch clock: pll_clk_48

    Latch clock: CLK_OUT_48

    I assume this is the 1ns hold time that has been eaten up by the faster model.

    How is the above best fixed? I believe i have my design optimized for speed per the design assistant's directions.

    Edit: I fixed the fast model th violation by Selecting "Standard fit (highest effort)" in the QII Fitter settings. My design now passes both fast and slow timing models.

    Thanks again,

    /John.