I'm trying unsuccessfully to use an IOPLL to synthesize a clock and have it be in-phase with the reference clock, where both the reference clock and the synthesized clock are routed on GCLKs. By "in-phase" I mean ideally zero or near-zero skew, i.e. where the rising edges of the synthesized clock and the reference clock line up with each other. And to clarify/simplify, both the reference clock and the synthesized clock are just used to clock internal fabric resources, there is no external I/O involved. With all attempts thus far, I'm seeing very large skew between the synthesized clock and the reference clock, as much as ~5 ns. So I can only assume that something is fundamentally wrong with how I'm configuring the IOPLL. If necessary I can provide a simple design example, timing reports, etc. But before diving into that, possibly unnecessarily, let's start with some basic questions. And just for background reference, I'm well familiar with PLLs and de-skewing techniques in general, and have done this routinely in Xilinx devices. But I'm not as familiar yet with the Arria 10 PLL resources, and am finding it somewhat difficult to find good information. The Intel/Altera documentation I've found describing the IOPLL has been fairly scant, and the IP generator and simulation library models obscure critical details on low-level configuration options, internal functionality, feedback paths, etc... so at this point I must humbly request some guidance from knowledgeable Intel/Altera insiders, please. So, here we go: I'm using the IP generator wizard to configuring the IOPLL to "normal" compensation mode (and all other options default). "Normal" mode as tersely described in Altera documentation "compensates for the delay of the internal clock network used by the clock output". Of the listed compensation modes, that description, while not entirely clear, sounded like the appropriate choice for what I'm trying to accomplish. Is it? It claims to compensate, and yet it doesn't expose the lock feedback path to the user, so any means by which it is trying to compensate is hidden from me. So firstly, was that even a correct interpretation of the description of "normal" mode? Is "normal" mode meant to produce clocks that on GCLKs will be in phase with the reference clock that is also on a GCLK? And if not, please steer me in the right direction, and we'll go from there. Thanks,-Roee

Hi Dima, The approach I ended up taking was to use an IOPLL in its "direct" compensation mode, and then deal with the non-zero skew at the clock domain crossings. With the IOPLL in direct compensation mode, you do end up with some positive skew from the upstream clock to the downstream clock. But this skew is predictable/repeatable, doesn't vary significantly from build to build, and is on the order of a couple of nanoseconds. So, knowing the approximate skew relationship between the clocks, you can still deal with it statically using fully synchronous design techniques. As to handling the domain crossings, the following is a relatively simple approach that can meet timing up to moderately high clock frequencies (for an Arria 10 in -1 speed grade, say ~350 MHz): Crossing from the downstream clock domain to the upstream clock domain is the simpler of the two crossings. You can simply go direct reg-to-reg. Setup is the limiting factor here, and the time available for the reg-to-reg path is essentially the entire clock period minus the clock skew (and of course minus the clock-to-out of the source reg and the setup time requirement of the destination reg). Even after losing those couple of nanoseconds to clock skew, at the clock frequencies we're talking about you should still have ample setup slack on a direct reg-to-reg path. Crossing from the upstream clock domain to the downstream clock domain is the more tricky crossing, with hold being the limiting factor. If you simply go direct reg-to-reg, you will probably end up with unfixable hold violations due to the clock skew. My solution to this was to go reg-to-reg-to-reg, where the first register stage is on the rising edge of the upstream clock, the second is on the falling edge of the downstream clock, and the third is on the rising edge of the downstream clock. This ensures that you have no less than half a clock period of time available for each reg-to-reg path, assuming a 50% duty cycle on the downstream clock. Or more precisely, you have half a clock period plus the clock skew for the first reg-to-reg path, and exactly half a clock period for the second reg-to-reg path. This should ensure ample hold slack, though we are now setup limited again. If you're pushing for a high clock frequency where the half-period reg-to-reg path becomes the critical path for setup, you can further balance the time available between the register stages by modifying the duty cycle of the downstream clock, which you can naturally do using the configuration of the IOPLL. You have a total of one clock period plus the skew to go reg-to-reg-to-reg, so nominally you'd want the falling edge used for the middle register stage right in the middle of that, which you can get closer to by adjusting the duty cycle of the downstream clock. But again, I'd only bother with this optimization if this crossing becomes your critical path. Otherwise just keep it simple with a 50% duty cycle. For applications where the attainable timing performance is adequate, the above approach has the advantages of being relatively simple, fully synchronous, having no timing exceptions, requiring no additional clock phases, and requiring no special timing constraints. And if you need to push the performance even higher, you'll have to get even more creative, and I have... but I won't go into that here. Hope this helps, and let me know if you have any questions. -Roeeroee@porcupinetech.com

I think you may be using the wrong mode. See the user guide: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/archives/ug_altera_iopll-18-1.pdf Maybe you want to be using zero-delay buffer mode? It's not entirely clear what your goal is so I'm kind of guessing.

Thanks, @sstrell, but zero delay buffer mode doesn't seem to be what I need. That's for putting out a clock to the board via a chip level I/O pin, and de-skews the clock for that external output, not for an internal GCLK.From that doc (UG-01155), which I have been scouring:"If you select the zero delay buffer mode, the PLL must feed an external clock output pin and compensate for the delay introduced by that pin. The signal observed on the pin is synchronized to the input clock. The PLL clock output connects to the altbidir port and drives zdbfbclk as an output port. If the PLL also drives the internal clock network, a corresponding phase shift of that network occurs."My situation is purely on-chip, no external I/O involved. Assume I have a given clock signal "clk1" that's already on a GCLK, and I need to produce another clock signal "clk2" that is also on a GCLK, is at an integer multiple of the frequency of "clk1", and is phase-aligned with "clk1". That's what I'm trying to accomplish.It seems like in principle what I need is more analogous to the IOPLL's "external mode", which exposes the feedback path to the user. Except that instead of running the feedback path off-chip through I/O pins, I need to run the feedback path on-chip through just a clock control block and GCLK. But "external mode" doesn't allow that either, it already has the input pin and output pin I/O buffers built in and has to go off-chip. So...?

Hi,In the Normal mode the FBCLK_IN pin of the IOPLL is fed by a CLKCTRL block whose input is driven by the FBOUT output pin of the IOPLL. The CLKCTRL block is added automatically by the tool. Hence, the FBCLK_IN is not exposed to user by the IP. You can check this in the Technology map viewer after running fitter. IOPLL User Guide mentions the following, and the tool seems to implement the same.If you select the normal mode, the PLL compensates for the delay of the internal clock network used by the clock output. If the PLL is also used to drive an external clock output pin, a corresponding phase shift of the signal on the output pin occurs.Now I want to ask a question related to the measurement technique used to verify whether there is a delay between the input clock and the generated output clock or not. How are you measuring the delay between the clocks.Regards

Hi @Ash_R_Intel, thank you for your response.Your description of normal mode matches what I thought it should do, and yes, I can confirm via the technology viewer that it is in fact implementing the feedback path exactly as you described. So it should be able to de-skew as intended, but it doesn't seem to. I say this based on the clock skew shown in the static timing analysis report.Pasted below is a snippet from the .sta.rpt from a trivial design example showing a reg-to-reg timing path going from the "clk2" domain (the output clock from the IOPLL) to the "clk1" domain (the input clock to the IOPLL). As you can see in the report, the clock path is mapped as expected, with "clk1" on CLKCTRL_2I_G_I7, which then goes to the IOPLL, then to "clk2" on CLKCTRL_3C_G_I21 (and the IOPLL's feedback path is not explicitly shown in this report but is confirmed via technology view to be exactly as you described).Now, as you can see in the report, there is a massive hold violation (-4.661ns) resulting from a massive skew between these two clocks (5.084ns). And we can see in the report there is a compensation delay being applied in the IOPLL (-9.485ns, shown as type "COMP"), but it isn't obvious to me how it's coming up with that compensation amount, as this is not having a de-skewing effect. Rather, it actually seems to be far too large of an "anti-delay".Path #1: Hold slack is -4.661 (VIOLATED)===============================================================================+---------------------------------------------------------+; Path Summary ;+---------------------------------+-----------------------+; Property ; Value ;+---------------------------------+-----------------------+; From Node ; ff3 ;; To Node ; ff4 ;; Launch Clock ; clk1 ;; Latch Clock ; clk1 ;; Data Arrival Time ; -0.543 ;; Data Required Time ; 4.118 ;; Slack ; -4.661 (VIOLATED) ;; Worst-Case Operating Conditions ; Slow 900mV -40C Model ;+---------------------------------+-----------------------++-------------------------------------------------------------------------------------+; Statistics ;+------------------------+-------+-------+-------------+------------+--------+--------+; Property ; Value ; Count ; Total Delay ; % of Total ; Min ; Max ;+------------------------+-------+-------+-------------+------------+--------+--------+; Hold Relationship ; 0.000 ; ; ; ; ; ;; Clock Skew ; 5.084 ; ; ; ; ; ;; Data Delay ; 0.787 ; ; ; ; ; ;; Number of Logic Levels ; ; 0 ; ; ; ; ;; Physical Delays ; ; ; ; ; ; ;; Arrival Path ; ; ; ; ; ; ;; Clock ; ; ; ; ; ; ;; IC ; ; 5 ; 4.989 ; 61 ; 0.000 ; 2.573 ;; Cell ; ; 9 ; 3.166 ; 39 ; 0.000 ; 0.804 ;; PLL Compensation ; ; 1 ; -9.485 ; 0 ; -9.485 ; -9.485 ;; Data ; ; ; ; ; ; ;; IC ; ; 1 ; 0.529 ; 67 ; 0.529 ; 0.529 ;; Cell ; ; 2 ; 0.086 ; 11 ; 0.000 ; 0.086 ;; uTco ; ; 1 ; 0.172 ; 22 ; 0.172 ; 0.172 ;; Required Path ; ; ; ; ; ; ;; Clock ; ; ; ; ; ; ;; IC ; ; 3 ; 2.587 ; 66 ; 0.000 ; 2.587 ;; Cell ; ; 4 ; 1.321 ; 34 ; 0.000 ; 0.632 ;+------------------------+-------+-------+-------------+------------+--------+--------+Note: Negative delays are omitted from totals when calculating percentages+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+; Data Arrival Path ;+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+; Total ; Incr ; RF ; Type ; Fanout ; Location ; HS/LP ; Element ;+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+; 0.000 ; 0.000 ; ; ; ; ; ; launch edge time ;; 0.000 ; 0.000 ; ; borrow ; ; ; ; time borrowed ;; -1.330 ; -1.330 ; ; ; ; ; ; clock path ;; 0.000 ; 0.000 ; ; ; ; ; ; source latency ;; 0.000 ; 0.000 ; ; ; 1 ; PIN_AR36 ; ; clk1_p ;; 0.000 ; 0.000 ; RR ; IC ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input|i ;; 0.632 ; 0.632 ; RR ; CELL ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input|o ;; 0.762 ; 0.130 ; RR ; CELL ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input~io_48_lvds_tile/ioclkin[2] ;; 0.762 ; 0.000 ; RR ; IC ; 2 ; CLKCTRL_2I_G_I7 ; ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk ;; 1.211 ; 0.449 ; RR ; CELL ; 5 ; CLKCTRL_2I_G_I7 ; ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;; 3.784 ; 2.573 ; RR ; IC ; 1 ; IOPLL_3C ; High Speed ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|refclk[0] ;; 4.526 ; 0.742 ; RR ; CELL ; 1 ; IOPLL_3C ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vco_refclk ;; 4.526 ; 0.000 ; RR ; CELL ; 1 ; IOPLL_3C ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vctrl ;; -4.959 ; -9.485 ; RR ; COMP ; 2 ; IOPLL_3C ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vcoph[0] ;; -4.155 ; 0.804 ; RR ; CELL ; 1 ; IOPLL_3C ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|outclk[0] ;; -4.155 ; 0.000 ; RR ; CELL ; 1 ; IOPLL_3C ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~io_48_lvds_tile/pllcout[4] ;; -4.155 ; 0.000 ; RR ; IC ; 2 ; CLKCTRL_3C_G_I21 ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|inclk ;; -3.746 ; 0.409 ; RR ; CELL ; 1 ; CLKCTRL_3C_G_I21 ; ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|outclk ;; -1.330 ; 2.416 ; RR ; IC ; 1 ; FF_X77_Y121_N55 ; High Speed ; ff3|clk ;; -1.330 ; 0.000 ; RR ; CELL ; 1 ; FF_X77_Y121_N55 ; High Speed ; ff3 ;; -0.543 ; 0.787 ; ; ; ; ; ; data path ;; -1.158 ; 0.172 ; FF ; uTco ; 1 ; FF_X77_Y121_N55 ; ; ff3|q ;; -1.072 ; 0.086 ; FF ; CELL ; 1 ; FF_X77_Y121_N55 ; High Speed ; ff3~la_lab/laboutb[16] ;; -0.543 ; 0.529 ; FF ; IC ; 1 ; FF_X77_Y121_N53 ; High Speed ; ff4|asdata ;; -0.543 ; 0.000 ; FF ; CELL ; 1 ; FF_X77_Y121_N53 ; High Speed ; ff4 ;+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+; Data Required Path ;+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+; Total ; Incr ; RF ; Type ; Fanout ; Location ; HS/LP ; Element ;+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+; 0.000 ; 0.000 ; ; ; ; ; ; latch edge time ;; 0.000 ; 0.000 ; ; borrow ; ; ; ; time borrowed ;; 3.754 ; 3.754 ; ; ; ; ; ; clock path ;; 0.000 ; 0.000 ; ; ; ; ; ; source latency ;; 0.000 ; 0.000 ; ; ; 1 ; PIN_AR36 ; ; clk1_p ;; 0.000 ; 0.000 ; RR ; IC ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input|i ;; 0.632 ; 0.632 ; RR ; CELL ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input|o ;; 0.791 ; 0.159 ; RR ; CELL ; 1 ; IOIBUF_X78_Y115_N47 ; ; clk1_p~input~io_48_lvds_tile/ioclkin[2] ;; 0.791 ; 0.000 ; RR ; IC ; 2 ; CLKCTRL_2I_G_I7 ; ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk ;; 1.321 ; 0.530 ; RR ; CELL ; 5 ; CLKCTRL_2I_G_I7 ; ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;; 3.908 ; 2.587 ; RR ; IC ; 1 ; FF_X77_Y121_N53 ; High Speed ; ff4|clk ;; 3.908 ; 0.000 ; RR ; CELL ; 1 ; FF_X77_Y121_N53 ; High Speed ; ff4 ;; 3.754 ; -0.154 ; ; ; ; ; ; clock pessimism removed ;; 3.754 ; 0.000 ; ; ; ; ; ; clock uncertainty ;; 4.118 ; 0.364 ; ; uTh ; 1 ; FF_X77_Y121_N53 ; ; ff4 ;+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+----------------------------; Extra Fitter Information ;----------------------------HTML report is unavailable in plain text report export.

Clock synthesis and de-skewing using an IOPLL in Arria 10

17 Replies

roeekalinsky
Contributor
4 years ago
Thanks, @Ash_R_Intel.

For experiment's sake, modifying my trivial design example to feed the PLL directly from the FPGA input pin, the PLL's refclk arrives at 0.704ns, and clk2 arrives at ff3 at -0.687ns. And to recap the original scenario, having a CLKCTRL upstream of the PLL, the PLL's refclk arrives at 3.784ns, and clk2 arrives at ff3 at -1.334ns.

With the addition of an upstream CLKCTRL / GCLK in the latter case, one would expect a later clk2 arrival time, not earlier as observed. So one must ask, does this observed result even make sense on the face of it?

It's the compensation figure in the PLL that is coming up vastly different between the two scenarios, -5.601ns vs. -9.485ns, respectively, and that's what's responsible for clk2 arriving even earlier with the upstream CLKCTRL rather than much later as one would expect. It is unclear where that difference in compensation arises, as the clock distribution downstream of the PLL is identical between the two scenarios, and I think we're both agreeing that the PLL should not in any way be compensating for the added delay of a CLKCTRL / GCLK upstream of it. Right? The observed difference in compensation does not correspond to a difference in the clock network delays. So from where does this difference in compensation arise?

A purely speculative possible interpretation of the observations above, it almost looks as though Quartus is attempting to adjust the delay of the compensation loop somehow to phase-align clk2 to the FPGA clock input pin in both cases, as though it IS trying to compensate for the added delay of the upstream CLKCTRL / GCLK if present (which is NOT what we expect nor want). Could that be the case? Is that what it's trying to do? And if so, is there a way via constraints or otherwise to prevent Quartus from doing that?

Taking a step back:

The available documentation (UG-01155 and A10-HANDBOOK) does seem to indicate, in the text and in block diagrams, that the IOPLL can optionally receive its refclk input from a GCLK or RCLK network. So, is that fully supported, or not? And if so, what are the expected compensation characteristics with refclk fed from a GCLK or RCLK network?

As to the suggestion of feeding the PLL directly from the FPGA input pin instead:

The trivial design example I presented for discussion is just that. In reality, I don't have the luxury of feeding the PLL directly from the FPGA's input pin. What I'm developing is a reusable IP block that will be integrated into numerous different FPGA designs, where other parties may own the top level and other IP blocks residing in it. There will generally not be much visibility between parties and their respective IP, nor the opportunity to collaborate on the top level clocking scheme. The top level FPGA designs into which this IP block will be integrated may have different and unknown top level clocking schemes, and I can't really make any assumptions about the ultimate origin of the system clock provided to my IP block other than it will already be on a GCLK when I receive it. Then, internally in my IP block, I have a need to generate one or more derived clocks at integer multiples of the incoming system clock frequency and in-phase with it (and possibly also with dynamic gating).

I hope that gives you some context and a better idea of what I'm ultimately trying to accomplish. And if you have other suggestions that can accomplish these requirements within these limitations, I'm all ears.

I should mention too, just for background, that I've been doing this routinely in Xilinx devices, with which admittedly I am far more familiar. I naturally assumed that a similar capability exists in Altera devices, and the documentation seemed to suggest as much, though not clearly... I hope that was not an incorrect assumption/interpretation on my part.

Fundamentally, the capability I'm seeking is this: To be able to take into my block what is already a global clock, and from it to generate new global clocks that are in-phase with it. Is that possible in the Arria 10's PLL / clocking architecture, or not? And if so, how?

I look forward to your input.

Thanks,
-Roee
Ash_R_Intel
Regular Contributor
4 years ago
Hi,
The compensation factor is dependent upon the routing that takes place during fitter. With design changed, the placement of the CLKCTRL blocks and the registers also changes. So, it is very much possible to have that variation in the compensation factor.

I agree, PLL does not compensate for the upstream CLKCTRL. It just takes care of the clocks that are generated from it.

Coming back to the original query, want to mention couple of points here.
1) GCLK networks provide less skew for a clock that passes through it.
2) Two different clocks on two different CLKCTRL blocks cannot have identical delays, just because of the fact that they are independent and have to reach to different flops in the chip and different locations.
3) For the PLL generated clocks as well, the same logic applies. They cannot have zero skew between themselves because they drive different paths.
4) The PLL definitely maintain the phase relationship between its input and output clock.
5) As long as the tool reports that there are no timing failures in the design, skews between the clocks should not be a matter of concern.
6) When the data path changes from one clock to other clock, it is better to either provide a set_max_skew constraint or declare that path as a false path.

If you look at a path in the tool driven by the same clock going through CLKCTRL, you will find a near to zero skew, but the same cannot be expected from different clock paths even though they have fixed relation. The skew on all the reported paths between the two clocks however, should remain same. If the tool reports the same clock skew number in these paths, then we are good.

Hope this helps.

Regards.
roeekalinsky
Contributor
4 years ago
Hi @Ash_R_Intel,

Thanks for the feedback. I've gathered additional information on this issue from other sources as well, and the bottom line is that the Arria 10 IOPLL can't properly support the approach I was trying to take. It can't phase-align a GCLK output to a GCLK reference input. So I will use a different approach to accomplish my design goals.

Note however that there is an incorrect piece of information here, an incorrect understanding/assumption that we both made, as I've now learned. And this is key to the whole thing.

@Ash_R_Intel wrote:
>> I agree, PLL does not compensate for the upstream CLKCTRL. It just takes care of the clocks that are generated from it.

Turns out that's not entirely true, and that's the primary cause for the skew I'm seeing. I've received confirmation that, as I suspected, Quartus is actually trying its best to compensate for all of the delay upstream of the IOPLL, including for the upstream CLKCTRL / GCLK if present (presumably coarsely matching those delays using static delays alone). Even if there's an upstream CLKCTRL / GCLK, it will try to match the output phase of the IOPLL to that of the FPGA input pin upstream of it all, not to the phase of the GCLK at the IOPLL's refclk input. Though this is never made clear in the documentation, this is the defined behavior for the IOPLL's normal mode when downstream of a GCLK. And there is no means by which to disable this behavior.

I wanted to clarify that here for anyone else who may be affected by this.

Thanks,

-Roee
- dlevit
  New Contributor
  4 years ago
  Hi Roee,
  
  I'm in the same situation as you, moved recently from Xilinx to Intel, and puzzling about the same problem. Could you please share the approach which you found in regard to deskewing the clocks?
  
  Thanks,
  
  Dima
Ash_R_Intel
Regular Contributor
4 years ago
Apologies for that statement. You are right, the IOPLL does try to compensate from pin to pll path. Please refer below link:
Intel® Quartus® Prime Pro Edition Help version 21.1 - PLL Compensation Mode logic option

Regards

Forum Discussion

Clock synthesis and de-skewing using an IOPLL in Arria 10

17 Replies

Recent Discussions

Question

Will serialization factor of 6 in LVDS serdes IP be supported in the future on Agilex5?

1.8 V LVDS Receiver Timing Specifications for Intel MAX 10 Dual Supply Devices

Avalon-ST configuration with Agilex 3 fails

Quartus Prime Pro 25.1 fatal error during fitter: Windows "Efficiency mode" required