Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

Confused with set_input/output_delay command

Hi folks, while I am familiar with set_input/output_delay -max where it specifies the maximum time taken for the data to travel to and from FPGA, I am still very confused with -min value. I noticed in most cases, we specify a negative value for -min but what does that actually mean? It will still take time to travel, just that the value should be smaller than -max but why the negative value?

16 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    If the data arrives before the specified clock edge, then the delay is negative. The maximum or minimum delays specified could be positive or negative. The only restriction is that max > min.

    --- Quote End ---

    I believe that when we mention data vs clock relationship it will be different between setup and hold, right? i.e-:

    1) Setup : data vs clock edge at next clock cycle (setup relationship of clock period)

    2) Hold : data vs clock edge at current clock cycle (hold relationship of 0ns)
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yes and no. When we talk about the values for set_input/output_delay, then the clock edges do not matter. But for the entire setup analysis, they do matter. For example:

    10ns clocks, and FPGA is transmitting to an external device who has a Tsu of 4ns. That 4ns is the data_delay-clock_delay inside the external part, but it is 4ns no matter what edges we are talking about.

    Let's say the FPGA has a Tco of 8ns and board delay is 0.5ns. The default setup relationship is 10ns, i.e. we launch at edge 0ns and latch at edge 10ns. The data arrives at 8 + 0.5 + 4 = 12.5ns and it fails timing by -2.5ns.

    Now we add a multicycle to make the latch edge at times 20ns. The data_delay-clock_delay inside the external part is still 4ns and it still has a Tsu of 4ns. But now we meet timing by 7.5ns slack.

    Similarly for setup and hold, the edges are used in calculating if we meet timing, but the external delays are usually independent of what those edges are.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Now that makes more sense - iExt and oExt are independent on clock edges (Setup/Hold Relationship) but the edges are mainly used for calculating slacks.

    What if I have a scenario where I am connecting FPGA-FPGA. How do I calculate the iExt and oExt as Tco, Tsu and Th are not fixed, right?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    That one is a pain because both sides are flexible. Here's the method I use. In the example, FPGA_A drives to FPGA_B and the clock period will be 10ns.

    1) I start by overconstraining the driving FPGA first. You could do either one, but that one is usually easier to understand. So something like:

    create_clock -period 10.0 -name FPGA_A_clk [get_ports A_clk]

    create_clock -period 10.0 -name ext_B_clk

    set_output_delay -clock ext_B_clk -max 10.0 [get_ports A_outputs*]

    So I have a 10ns setup relationship and say the external device has a 10ns delay, so I am asking the FPGA to have a Tco of 0ns. It will fail timing, but it will try to get as fast as it can.

    2) I compile and see that it fails by -6ns, i.e. FPGA_A has a Tco of 6ns. So now I loosen the requirement to get that to pass:

    set_output_delay -clock ext_B_clk -max 4.0 [get_ports A_outputs*]

    I may also add a simple hold time at this point. Hold is not a concern, since the hold requirement is 0ns and as long as the delay across the interface is greater than 0(or greater than the clock skew if there is large skew), we'll meet timing. But something like:

    set_output_delay -clock ext_B_clk -min 0.0 [get_ports A_outputs*]

    (You could also look at the worst case hold skew and constrain based on that. So if it makes timing by 2.5ns, then do a set_output_delay -min -2.5ns, which means the FPGA_A must have an output delay greater than 2.5.ns to meet timing. But do not do hold until you've gotten the best setup, as you don't want the fitter to add delay to meet hold at the expense of setup)

    I recompile and make sure it is still meeting timing.

    3) At this point I have fixed one side. I then constrain FPGA_B to say it's external delay based on FPGA_A:

    create_clock -period 10.0 -name FPGA_B_clk [get_ports B_clk]

    create_clock -period 10.0 -name ext_A_clk

    set_input_delay -clock ext_A_clk -max 6.0 [get_ports B_inputs*]

    set_input_delay -clock ext_A_clk -min 2.5 [get_ports B_inputs*]

    If FPGA_B meets timing, technically you're done. To add board delays, I would just add it to the one that has slack, i.e. FPGA_A barely meets its 6ns Tco requirement, but if FPGA_B has 500ps of slack, and the board delay is 300ps, then increase the max delay to 6.3ns on FPGA_B's max input delay.

    Often you have lots of slack, and we've constrained A to be as tight as possible. Let's say the clock period is 20ns, and after following these steps FPGA_B has 8ns of slack, while FPGA has hardly any. You can loosen the requirements on A and tighten them on B the same amount, but it's not necessary.

    The basic idea though is to get one FPGA to get "the best timing" that it can, then use that to constrain the other one. (There are other scenarios. If the interface is slow, just give them both roughly half the data period and be done with it, as you know it meets timing)
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks for the detailed information. That is extremely useful but I still have a couple of questions. I believe this is for non source synchronous transfer. What if I have a transfer like in the attached picture

    1) Aren't the delay fixed for both registers (last reg in FPGA A and first reg in FPGA B which is in the hardblock DDIO cell) and the clock path to both registers? If yes, how do Quartus actually try to optimize the timing (try to get as fast as it can)?

    2) If this is a source-synchronous Edge-aligned transfer, can the same method/order of constraints be applied? If we constrain FPGA B first, how do we then constrain FPGA A since the setup relationship will be 0ns and Hold Relationship = -1/2 clock period?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    In some devices there are delay chains between the IO port and the input/output register, so the fitter can modify those to meet timing. But yes, there's not a whole lot that generally happens. With a source-synchronous interface, especially the outside, the data and clock are generally as closely matched as they can be by default.

    The same method can be applied. In the first step, you may not need any constraint at all.

    For source-synch, the input side(FPGA_B) there tends to be more variation. The reason is that the clock is on a global clock tree and then drives to the input register, while the data comes directly in. They follow two very different paths and delay chains may be necessary to get the alignment you want, and timing constraints will drive the fitter to set those delay chains.

    For source-synch, I usually try to do symmetric requirements, i.e. I set up my clocks to have a setup relationship of Xns and a hold relationship of -Xns, and then the external delays are +/-Yns. If you do it this way, you can squeeze Y to be as large as you can on either side, then use that to constrain the other FPGA.

    And one last thing, note that at smaller geometries, the on-die variation gets larger(plus other issues) and so regular source-synchronous interfaces get slower. I used to run DDR registers for source-synchronous Cyclone III interfaces at 600Mbps. 28nm got slower(which is why they added dedicated LVDS SERDES to lower end devices like Cyclone V) and 20nm is even worse. It's hard to run a regular source-sync DDR interface above 200Mbps in Arria 10. (If you use the altlvds block that doesn't require timing constraints, you can go above 1Gbps, and 1.6Gbps with DPA...)