Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

Source Synchronous Analysis with TimeQuest

This is a plug, but I just added a document on source-synchronous timing analysis to the alterawiki:

http://www.alterawiki.com/wiki/source_synchronous_analysis_with_timequest

It is a companion guide to the TimeQuest User Guide I have up there(except this one has actual projects, and I want to add more...)

I had great difficulty with this one, as it was very difficult determining when too much information was provided or too little. (You'll see I erred on the side of too much). I would appreciate any feedback from anyone trying to use it for a real design. If you email me through this forum, please make sure you haven't disabled reply emails. Or just add to this thread. Thanks.

16 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    So for something like Stratix IV/V, there are dedicated LVDS serializers in the I/O, along with a dedicated clock tree to drive them. Basically the entire receiver is in hard logic and there is no variability. When used, these run much faster than just putting down a PLL and two DDR input registers. Part of the reason is because they are made from this dedicated silicon, and part is because they can be timing analyzed as a macro, and all the pessimisms/unknowns can be removed. In fact, the timing analysis is significantly different. If you build two DDR registers, you have to do timing constraints like my guide shows, but if you use the dedicated silicon via the altlvds_rx megafunction, TimeQuest will spit out an RSKM number and you should use that(and it will be a very good number.) (Also note that altlvds_rx with a deserialization factor of 2 will build the circuit with DDR registers and the timing analysis I've described, so it's not a given that using altlvds_rx will use the dedicated silicon).

    Because of the dedicated deserializers and what not, those devices have data sheet values of how fast they can run. They look a lot like this Cyclone IV data sheet number. Yet I threw down an altlvds_rx into that design(make the deserialization a factor of 4) and tried to run it at 350MHz/700Mbps, but it did the regular timing analysis(no RSKM) and failed.

    Please file an SR on how to do that or what they mean. Please update this post with your results, as I should know this too. (It sounds like it doesn't help, but all the V series, including Cyclone V, have the dedicated serdes logic for LVDS, enabling much higher data rates, and somewhat making my app note less relevant.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ryan, I have submitted an SR 10904310. I have been contacted with initial questions, but have no real feedback provided.

    I attended virtual training this past week on Advanced Constraints in Timing Quest. I spoke with Steven Strell about my questions. He was not able to answer them and recommended that I contact you again.

    The main thing I am looking for is a statement from Altera on what the maximum performance the Cyclone 4 can achieve using Source Synchronous DDR. I am working on a design with another group that is referencing the Altera data sheet which states frequencies above 400MHz, so 350Mhz should work. Based on your earlier confirmation 350Mhz will not work.

    Is there any additional support that you might provide.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Cyclone III and Cyclone IV both have a section on High-Speed I/O Timing, that says TCCS and RSKM can be used and the user does not have to enter timing constraints. There is also a TCCS and Sampling Window(SW) value in the data sheet. Yet if you go in and put source-synchronous timing constraints on an interface in these devices, the timing will be much worse than the datasheet RSKM and TCCS values. What they really mean to say is, don't enter timing constraints, because it will not meet the numbers, but just use Sampling Window(SW)/TCCS from the datasheet directly in your own calculations.

    Bottom line is the timing models are very conservative, especially when applied to a source synchronous interface which relies on skew instead of raw delays. The on-die variation is conservative, the analysis uses the sum of many "worst case" values that would never all happen together, there is no locality pessimism removal(my made-up name), etc. Basically the skew reported with traditional source-synchronous timing looks worse than you'd ever see in silicon, and this is the way it is accounted for.

    I don't like that there isn't any report out of TimeQuest to confirm this, and the user must just take the datasheet value and trust it's right, but that is the methodology.

    Note that with Stratix devices and 28nm devices(including CV), there is dedicated high-speed SERDES that is instantiated with the altlvds megafunction(this is not the transceivers, just the serdes logic along with some other dedicated hardware, and note that in 28nms it can be used for I/O standards other than LVDS). If this hardware is used, we know you're doing source-synchronous and therefore don't even allow traditional source-synchronous timing and will directly report an RSKM and TCCS in TimeQuest. But with Cyclone III/IV, this serdes is build out of logic, and there's no direct way to identify that a DDR input(which is in the fabric) or output(which is in the I/O) is being used for source-synchronous timing or just old-fashioned system-centric timing, so if constrained TimeQuest still reports a value.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    For the sake of completeness, I wanted to add to this thread to show that achieving your targeted 700Mbp/s across 32 LVDS channels is possible in a CycloneIV and the technique for doing this is actually easier than first thought. One of the main issues is that the method for doing the timing analysis is not clearly spelled out in any Altera documentation.

    My test design looked like this:

    https://www.alteraforum.com/forum/attachment.php?attachmentid=6993

    The only MegaFunction you need to instantiate is an ALTLVDS_RX instance. Within this, Quartus will automatically create the PLL and other logic necessary to de-serialize the incoming data stream. I used a de-serialization factor of 8, which results in a 256-bit bus out the backend of the core and a reasonable 87.5MHz core clock.

    As mentioned by Rysc, the CycloneIV does not have dedicated hard Serializers / Deserializers (SERDES) in silicon – for this family, the SERDES logic is actually built out of the core fabric. This has the effect of changing the way timing analysis and constraints are performed on the design. The correct approach to be used for CycloneIV is actually to use no constraints on the input IO at all! It sounds counter-intuitive, so here’s the explanation.

    The CycloneIV Datasheet gives you a sampling window requirement for the receiver specifications (table 1-36). For the C6, C7, and C8/A7 speed grades, the sampling window is a fixed 400ps. You then work backward to determine your Receiver SKew Margin (RSKM); there will not be a skew margin spec in the datasheet since it just shows the required data valid window for the data to be captured correctly. RSKM can be calculated by hand using the equations and methodology shown in the Cyclone IV handbook at:

    http://www.altera.com/literature/hb/cyclone-iv/cyiv-51006.pdf#page=36

    The main drawback with this approach is you cannot get an RSKM report from Quartus II when not using dedicated SERDES. Essentially, what Altera is saying, is that as long as you meet the 400pS minimum sampling window, the SERDES are “guaranteed to work by design”. In other words, you have to trust them. :D

    For your design:

    (i) The data window is 1.429ns for 700Mbs. The sampling window is 400ps, which means the external device can skew its data in relation to the clock by 1.429ns - 0.4 = 1.029ns, or +/- 514ps. This should be very achievable.

    (ii) You need to make sure the capture clock is properly selected to capture the data, e.g. if the external device sends its data edge aligned, you need to center-align the clock by shifting it 90 degrees (within the ALTLVDS_RX GUI). If they send it center-aligned, you don’t shift the clock. If it’s something different (unlikely), then you need to manually adjust the clock phase-shifts.

    (iii) No input timing constraints are entered; the whole analysis is done on paper. Nothing is reported in Quartus for this and TimeQuest will report the inputs as Unconstrained. You can get rid of these warnings by using a set_false_path constraint in TimeQuest.

    (iv) The only major thing you need from the external device data sheet is the skew between the clock and data together with edge-aligned or center-aligned. (They may spec their device differently, such as describing the Tsu/Th they can provide to the receiver, but you can work that back.)

    One other thing I should mention. There’s a bug in the MegaWizard for the ALTLVDS_RX core when trying to use input buses greater than 18-bits. The GUI simply won’t allow this to happen, so I went in and manually edited the main Wizard generated file and the associated symbol file to give them a 32-bit input and 256-bit output.

    Hopefully this all makes sense – not having to constrain the input IO actually makes things far easier, though there is a slight faith element involved.

    Test design attached
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I do as you mentioned in the article,but can't meet timing . There are timing violations of tsh and th. How should I do now? My adc's data rate is 480M,the data output clock is 240M,and I use cyclone III.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    StephenG I downloaded your example design and compiled it on my PC. I analyzed the TimeQuest numbers to verify that the LVDS_RX block was actually sampling the data inside the 400ps sample window. I attached a spreadsheet that shows the delay times and calculations I made (it also includes the TimeQuest commands to get my numbers).

    Based on my interpretation of TimeQuest it appears that the LVDS_RX block is not sampling inside the Sample Window. For your LVDS block you chose a phase alignment of rx_in with respect to rx_inclock of 0 degrees, so edge aligned. Meaning the FPGA needed to phase shift 90 degrees to get center aligned data inside the sample window. In the spreadsheet my calculations say that the phase shift is somewhere between 30-60 degrees for most of the data (across temperature), and the sample window is between 64.8-115.2 degrees (+/- 200ps).

    Am I missing something in my analysis or is the LVDS_RX block actually sampling incorrectly?

    Thanks for the help!