This is a plug, but I just added a document on source-synchronous timing analysis to the alterawiki: http://www.alterawiki.com/wiki/source_synchronous_analysis_with_timequest It is a companion guide to the TimeQuest User Guide I have up there(except this one has actual projects, and I want to add more...) I had great difficulty with this one, as it was very difficult determining when too much information was provided or too little. (You'll see I erred on the side of too much). I would appreciate any feedback from anyone trying to use it for a real design. If you email me through this forum, please make sure you haven't disabled reply emails. Or just add to this thread. Thanks.

Great read, thanks. Found a few typos, highlights are in the attachment.

Awesome, thanks. I believe I can add a few sections pretty quickly and will incorporate your edits.

I am using Case 2, FPGA is the receiver and does not phase-shift the clock. When running through the example, Quartus 12.0 SP 2.16 reports the following warning: Warning (15062): PLL "ssync_pll:inst1|altpll:altpll_component|ssync_pll_altpll:auto_generated|pll1" in Source Synchronous mode with compensated output clock set to clk[0] is not fully compensated because it does not feed an I/O input register This makes sense because the FF's for the DDR are not in the IOB, but is this warning safe to ignore? Would one expect to always see this warning when doing Source Synchronous DDR? Is there some way to add constraint to block the warning? I also see for all data lines: Warning (176441): The I/O pin ssync_rx_data[x] cannot meet the timing constraints due to conflicting requirements. The I/O pin is a PLL compensated I/O, but the setup/hold requirements are in conflict with the source PLL mode(source synchronous or ZDB ). Can this too be ignored? When in the TimeQuest, after running the provided TQ_analysis.tcl with no change to the provided examples I get hold timing violations. I would expect at the minimum that this report should show all passing. Is this expected? Thanks for the help

Are you doing Cyclone III/IV, which doesn't have I/O input regs? If so, the first warning should be ignorable, since it can't put them in the I/O, as you state. I imagine that's a generic warning, that's saying, "You changed the PLL mode to ssync and I assume you want IO registers...". As long as it meets timing, ignore it. As for the second, I'm not quite sure what that means, but again, if you have correct timing constraints and meet timing, you should be fine. I just compiled Case 2 in Quartus(a build of 12.1 that will be released soon) and it met timing. Not sure what you're seeing. Is Case 2 the same thing you're doing in your design(frequency and external delays)?

I found that I was running my modified constraints, once reverting back to the supplied example, timing is met and I only see the first warning. I am trying to get this interface to work at 350Mhz on both a cyclone 3 and 4, speed grade 7. I read one spot that mention DDR interfaces could reach 400Mhz so I thought this should work. To test it out I set my ext board delays to zero, to assume all FPGA delay, then set the clock to 350 MHz. That is what prompted the second error and failed timing. It appears that the I/O timing constraints are conflicting due to my high clock speed. Can you confirm what is the maximum DDR frequency assuming zero external delays? Thanks

Source Synchronous Analysis with TimeQuest

16 Replies

Altera_Forum
Honored Contributor
13 years ago
So for something like Stratix IV/V, there are dedicated LVDS serializers in the I/O, along with a dedicated clock tree to drive them. Basically the entire receiver is in hard logic and there is no variability. When used, these run much faster than just putting down a PLL and two DDR input registers. Part of the reason is because they are made from this dedicated silicon, and part is because they can be timing analyzed as a macro, and all the pessimisms/unknowns can be removed. In fact, the timing analysis is significantly different. If you build two DDR registers, you have to do timing constraints like my guide shows, but if you use the dedicated silicon via the altlvds_rx megafunction, TimeQuest will spit out an RSKM number and you should use that(and it will be a very good number.) (Also note that altlvds_rx with a deserialization factor of 2 will build the circuit with DDR registers and the timing analysis I've described, so it's not a given that using altlvds_rx will use the dedicated silicon).
Because of the dedicated deserializers and what not, those devices have data sheet values of how fast they can run. They look a lot like this Cyclone IV data sheet number. Yet I threw down an altlvds_rx into that design(make the deserialization a factor of 4) and tried to run it at 350MHz/700Mbps, but it did the regular timing analysis(no RSKM) and failed.
Please file an SR on how to do that or what they mean. Please update this post with your results, as I should know this too. (It sounds like it doesn't help, but all the V series, including Cyclone V, have the dedicated serdes logic for LVDS, enabling much higher data rates, and somewhat making my app note less relevant.
Altera_Forum
Honored Contributor
13 years ago
Ryan, I have submitted an SR 10904310. I have been contacted with initial questions, but have no real feedback provided.

I attended virtual training this past week on Advanced Constraints in Timing Quest. I spoke with Steven Strell about my questions. He was not able to answer them and recommended that I contact you again.

The main thing I am looking for is a statement from Altera on what the maximum performance the Cyclone 4 can achieve using Source Synchronous DDR. I am working on a design with another group that is referencing the Altera data sheet which states frequencies above 400MHz, so 350Mhz should work. Based on your earlier confirmation 350Mhz will not work.

Is there any additional support that you might provide.
Altera_Forum
Honored Contributor
13 years ago
Cyclone III and Cyclone IV both have a section on High-Speed I/O Timing, that says TCCS and RSKM can be used and the user does not have to enter timing constraints. There is also a TCCS and Sampling Window(SW) value in the data sheet. Yet if you go in and put source-synchronous timing constraints on an interface in these devices, the timing will be much worse than the datasheet RSKM and TCCS values. What they really mean to say is, don't enter timing constraints, because it will not meet the numbers, but just use Sampling Window(SW)/TCCS from the datasheet directly in your own calculations.
Bottom line is the timing models are very conservative, especially when applied to a source synchronous interface which relies on skew instead of raw delays. The on-die variation is conservative, the analysis uses the sum of many "worst case" values that would never all happen together, there is no locality pessimism removal(my made-up name), etc. Basically the skew reported with traditional source-synchronous timing looks worse than you'd ever see in silicon, and this is the way it is accounted for.
I don't like that there isn't any report out of TimeQuest to confirm this, and the user must just take the datasheet value and trust it's right, but that is the methodology.
Note that with Stratix devices and 28nm devices(including CV), there is dedicated high-speed SERDES that is instantiated with the altlvds megafunction(this is not the transceivers, just the serdes logic along with some other dedicated hardware, and note that in 28nms it can be used for I/O standards other than LVDS). If this hardware is used, we know you're doing source-synchronous and therefore don't even allow traditional source-synchronous timing and will directly report an RSKM and TCCS in TimeQuest. But with Cyclone III/IV, this serdes is build out of logic, and there's no direct way to identify that a DDR input(which is in the fabric) or output(which is in the I/O) is being used for source-synchronous timing or just old-fashioned system-centric timing, so if constrained TimeQuest still reports a value.
Altera_Forum
Honored Contributor
12 years ago
For the sake of completeness, I wanted to add to this thread to show that achieving your targeted 700Mbp/s across 32 LVDS channels is possible in a CycloneIV and the technique for doing this is actually easier than first thought. One of the main issues is that the method for doing the timing analysis is not clearly spelled out in any Altera documentation.

My test design looked like this:

https://www.alteraforum.com/forum/attachment.php?attachmentid=6993

The only MegaFunction you need to instantiate is an ALTLVDS_RX instance. Within this, Quartus will automatically create the PLL and other logic necessary to de-serialize the incoming data stream. I used a de-serialization factor of 8, which results in a 256-bit bus out the backend of the core and a reasonable 87.5MHz core clock.

As mentioned by Rysc, the CycloneIV does not have dedicated hard Serializers / Deserializers (SERDES) in silicon – for this family, the SERDES logic is actually built out of the core fabric. This has the effect of changing the way timing analysis and constraints are performed on the design. The correct approach to be used for CycloneIV is actually to use no constraints on the input IO at all! It sounds counter-intuitive, so here’s the explanation.

The CycloneIV Datasheet gives you a sampling window requirement for the receiver specifications (table 1-36). For the C6, C7, and C8/A7 speed grades, the sampling window is a fixed 400ps. You then work backward to determine your Receiver SKew Margin (RSKM); there will not be a skew margin spec in the datasheet since it just shows the required data valid window for the data to be captured correctly. RSKM can be calculated by hand using the equations and methodology shown in the Cyclone IV handbook at:
http://www.altera.com/literature/hb/cyclone-iv/cyiv-51006.pdf#page=36

The main drawback with this approach is you cannot get an RSKM report from Quartus II when not using dedicated SERDES. Essentially, what Altera is saying, is that as long as you meet the 400pS minimum sampling window, the SERDES are “guaranteed to work by design”. In other words, you have to trust them. :D

For your design:

(i) The data window is 1.429ns for 700Mbs. The sampling window is 400ps, which means the external device can skew its data in relation to the clock by 1.429ns - 0.4 = 1.029ns, or +/- 514ps. This should be very achievable.

(ii) You need to make sure the capture clock is properly selected to capture the data, e.g. if the external device sends its data edge aligned, you need to center-align the clock by shifting it 90 degrees (within the ALTLVDS_RX GUI). If they send it center-aligned, you don’t shift the clock. If it’s something different (unlikely), then you need to manually adjust the clock phase-shifts.

(iii) No input timing constraints are entered; the whole analysis is done on paper. Nothing is reported in Quartus for this and TimeQuest will report the inputs as Unconstrained. You can get rid of these warnings by using a set_false_path constraint in TimeQuest.

(iv) The only major thing you need from the external device data sheet is the skew between the clock and data together with edge-aligned or center-aligned. (They may spec their device differently, such as describing the Tsu/Th they can provide to the receiver, but you can work that back.)

One other thing I should mention. There’s a bug in the MegaWizard for the ALTLVDS_RX core when trying to use input buses greater than 18-bits. The GUI simply won’t allow this to happen, so I went in and manually edited the main Wizard generated file and the associated symbol file to give them a 32-bit input and 256-bit output.

Hopefully this all makes sense – not having to constrain the input IO actually makes things far easier, though there is a slight faith element involved.

Test design attached
multiple-attachments.zip22 KB
Altera_Forum
Honored Contributor
12 years ago
I do as you mentioned in the article,but can't meet timing . There are timing violations of tsh and th. How should I do now? My adc's data rate is 480M,the data output clock is 240M,and I use cyclone III.
Altera_Forum
Honored Contributor
12 years ago
StephenG I downloaded your example design and compiled it on my PC. I analyzed the TimeQuest numbers to verify that the LVDS_RX block was actually sampling the data inside the 400ps sample window. I attached a spreadsheet that shows the delay times and calculations I made (it also includes the TimeQuest commands to get my numbers).

Based on my interpretation of TimeQuest it appears that the LVDS_RX block is not sampling inside the Sample Window. For your LVDS block you chose a phase alignment of rx_in with respect to rx_inclock of 0 degrees, so edge aligned. Meaning the FPGA needed to phase shift 90 degrees to get center aligned data inside the sample window. In the spreadsheet my calculations say that the phase shift is somewhere between 30-60 degrees for most of the data (across temperature), and the sample window is between 64.8-115.2 degrees (+/- 200ps).

Am I missing something in my analysis or is the LVDS_RX block actually sampling incorrectly?

Thanks for the help!
LVDS_FPGA_Internal_Timing.zip12 KB

Forum Discussion

Source Synchronous Analysis with TimeQuest

16 Replies

Recent Discussions

License issue

jtagserver.exe causing BSOD together with ftdi driver

When you double click on a word, the other instances do not highlight due to the Find Box being open

JTAG error (Unexpected error in JTAG server -- error code 35 and Can't access JTAG chain)

Tensor block usage