Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
11 years ago

How do I synchronize gray code counts safely across asynchronous clock domains?

I have a design for a generic asynchronous FIFO that I have used for many years. In this FIFO, I use gray code counters for the read and write pointers to the core memory. These multi-bit pointers must be synchronized to the opposite clock domains to compute full and empty flags (e.g. rdptr is synchronized to wclk to compare against the wptr to determine the full flag.) I am using 2-stage flops in the synchronizer to reduce the metastability.

The problem that I am seeing has to do with the placement (Quartus Fit) of the original gray code pointer in domain 1 and the the first set of flops in the synchronizer in domain 2. For clarity:

reg [w:0] rdptr_rclk; // rdptr in the rclk domain

always @ (posedge rclk)

begin

rdptr_rclk <= nx_rdptr_rclk;

end

reg [w:0] rdptr_wclk_s1; // first stage synchronize of rdptr into the wclk domain

reg [w:0] rdptr_wclk; // second (final) stage synchronize of rdptr into the wclk domain

always @ (posedge wclk)

begin

rdptr_wclk_s1 <= rdptr_rclk;

rdptr_wclk <= rdptr_wclk_s1;

end

In the sdc file, I have set_false_path between rclk and wclk.

Ideally, all 3 of these synchronizer stages (rdptr_rclk, rdptr_wclk_s1, and rdptr_wclk) would be placed by the fitter as closely together as possible. However, the fitter wants to place the rdptr_rclk flops on one side of the fifo, close to where the empty flag is generated and used, and it wants to place the rdptr_wclk flops on the opposite side of the fifo, close to where the full flag is generated and used. The other register, rdptr_wclk_s1, usually will get placed right next to the rdptr_wclk.

The problem occurs when some of the bits of rdptr_wclk_s1 are placed close to their rdptr_rclk counterpart, while other bits are placed far apart, especially when the skew between bits approaches or exceeds the period of the 2 clocks. In this case, the rdptr_wclk_s1 can see a transition on one bit before it sees the earlier transition on a different bit. For example:

Correct rdptr_rclk sequence:

  1. 0C:001100

  2. 0D:001101

  3. 0F:001111 (bit 1 transitions)

  4. 0E:001110 (bit 0 transitions)

  5. 0A:001010 (bit 2 transitions)

Sequence seen by rdptr_wclk_s1:

  1. 0C:001100

  2. 0D:001101

  3. 0C:001100 (bit 0 transitions)

  4. 0A:001010 (bit 1 and 2 transition)

  5. 0A:001010 (no transitions)

The sequence (2)0D to (3)0C at rdptr_sclk_s1 may look correct (only 1 bit changed), but this actually is a -1 step of the code, rather than a +1 step.

Note that this is NOT a metastability problem. The problem occurs because the fitter has placed rdptr_wclk_s1[1] far from rdptr_rclk[1] while placing rdptr_wclk_s1[0] right next to rdptr_rclk[0]. Also, this problem is build dependent. One build may have the problem, but it may disappear with the build the next day. And the same build may work on one board (slightly faster FPGA, available to me on my test floor) but have errors on a different one (slow FPGA on the customer's system).

In the tools for a different FPGA vendor, I am able to specify a DATAPATHONLY requirement of 1ns on the nets going into rdptr_wclk_s1, telling the placement tool to place the rdptr_wclk_s1 flops no further than 1ns away from the rdptr_rclk flops. But I have not found any way to do this with the Quartus tools.

The best that I am able to do is to create a logic-lock region around my fifo (or just my synchronizer), but this is an afterthought process, and can be forgotten when new fifos are added to a design. I would really like something that I can put into my code or into my constraints that will handle this for any of my fifos in my design.

Is there a different way of constraining this to force the fitter to place these 3 sets of flops near each other?

Or is there a different way of coding this to be more tolerant of the placement?

18 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I tried the multicycle with 2/1 and 1/0. In the latter case, I am now getting setup time violations, as well as the skew violation. The fitter is still inserting unnecessary (from my view) delay between flops. These flops are in the same or adjacent LABs, but the physical routing is going all over the place.

    Is there a way in the sdc to do something like: "for each inst in { find ajbg_fifo } do { set_max_skew { -from {$inst.rdptr_rck} -to {$inst.rdptr_wck_s1} }"
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I am facing a similar situation. Did you ever figure out a good way to constrain these?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks. I have filled a support request. If I hear anything from them I will post it here.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I have a similar problem with setting the rules for a Gray Counter.

    Wondering if byates ever received a response to the support request?

    Thanks.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I did not receive a useful reply from Altera support. However the local FAE helped a lot and came back with some suggestions. I also found some papers at zimmer (http://www.zimmerdesignservices.com/index.php?section=12)design that describe the problem and solutions for the ASIC world. His technique doesn't work directly but there is some very good information there.

    In the end I came up with the following solution which is overly complicated but does work. The design I'm using it on is a StratixV 5SGXEA7K2F40C3 with 123,666 of 234,720 ALMs used ( 53 % ).

    1. The first step is to generate a list of all source to destination registers that cross a gray code domain and extract their timing information. I created a TCL script that runs in TimeQuest which generates this list and outputs it to a file. This information can't be calculated from the SDC file due to limits Quartus has in which functions can be called during SDC. So I do it ahead of time and store the information in a file.

      • Use get_registers to get a list of all the gray coded registers for both the read and write sides of the FIFO.

      • Use get_timing_paths -from to get a list of all paths that go from a gray coded register to another register.

      • Write the timing information along with the path source and destination registers to the output file.

      • Each line in the file is: src_reg_name dest_reg_name src_clock dst_clock src_period dst_period counter_width_in_bits

      • Not all of the information in the output file is used but it is easy to calculate so I added it in case it becomes useful later.

    2. In my SDC file I use set_clock_groups to tell the system which clocks are related.

    3. In my SDC file, after all the set_clock_groups statements, I call another TCL script that dynamically generates timing constraints for each source->destination pair in the file output from the first stage.

      • I create a new clock for each source and destination clock listed in the file. All the skew constraints will use the new clocks. Remember, the original clocks have clock groups applied and will be ignored for cross clock timing analysis. Our new clocks will be setup to ignore all signals EXCEPT the gray coded paths we care about. The Zimmer paper discusses why this is necessary.

      • I use set_min_delay, set_max_delay, and set_false_paths to prevent timing analysis on all signals on the new clocks

      • I use set_max_delay to set a constraint equal to the desired skew on the new_src_clk->new_dest_clk path.

      • The tricky part is the skew limit. The set_max_delay command causes Quartus to limit the max_delay path but that path includes the clock_source routing delay to the register. We don't expect that clock_source delay to vary a lot from register to register (for a given clock) but we don't know what that delay is and we don't have a good way to calculate it. So, I set the skew limit to be the source clock period and add a 2ns fudge factor to account for clock source delay. So far that seems to work.

    4. To check the skew I use another TCL script that I run from TimeQuest which generates skew reports for each timing model. This script was created by folks at Altera. I modified it slightly to match the names of the registers in my design.

    There are some issues with this approach:

    1. Each build uses gray coded timing information from the last time you ran the generate timing file script. This is not too big a deal since most of the time the gray coded paths don't change that often.

    2. You have to run the 'generate timing information' script each time you add another FIFO (or remove one) in order to update the timing information file. I run it when I know the design has changed or when it has been a while just for good measure. I don't run it every time because it takes a long time and most of the time the output file is unchanged.

    3. The script called from the SDC file to dynamically generate skew constraints does not seem to add much delay to the build process. My build take about 2.5 hours and the skew part seems to make < 5min difference - if that.

    4. The script to generate the timing information (run after the build completes) takes a long time to run (~30 minutes or more).

    5. The script to generate skew report takes a long time to run (~30 minutes or more).

    6. The skew limit is fudged a little due to the source_clock routing delay being used by Quartus for set_max_delay.

    I have attached the files I am using. They are not generic. You will have to modify paths and such. You will also need to change the pattern matching value for the places where lists of registers are being generated. I'm not an expert with TCL so please excuse any strangeness you find! Or make improvements.

    1. cv_lib.tcl is a library with routines that are called from other scripts.

    2. gen_fifo_constraints.tcl is called from TimeQuest to generate the gray code timing data file.

    3. skew_report.tcl is called from TimeQuest to generate all the skew reports.

    In you SDC file you will need to add a reference to the cv_lib library. You will also need to make a call to the function create_fifo_skew_constraints after you generate all your clocks and clock constraints.

    
    lappend ::auto_path "<path to directory containing cv_lib>"
    package require cv_lib
    ...
    #  generate clocks <create_clock>
    #  constrain clocks <set_clock_groups>
    ...
    cv_lib::create_fifo_skew_constraints
    
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Sorry for my poor English, I'm from China

    I found something strange:in the &#12298;SCFIFO and DCFIFO IP Cores User Guide&#12299;&#65292;which is the altera megacore fifo user guide&#65292;it says that

    When using the Quartus II TimeQuest timing analyzer with a design that contains a DCFIFO block apply

    the following false paths to avoid timing failures in the synchronization registers:

    •For paths crossing from the write into the read domain, apply a false path assignment between the

    delayed_wrptr_g and rs_dgwp registers:

    set_false_path -from [get_registers {*dcfifo*delayed_wrptr_g

    [*]}] -to [get_registers

    {*dcfifo*rs_dgwp*}]

    •For paths crossing from the read into the write domain, apply a false path assignment between the

    rdptr_g and ws_dgrp registers:

    set_false_path -from [get_registers {*dcfifo*rdptr_g

    [*]}] -to [get_registers

    {*dcfifo*ws_dgrp*}]

    The false path assignments are automatically added through the HDL-embedded Synopsis design

    constraint (SDC) commands when you compile your design. The related message is shown under the

    TimeQuest timing analyzer report.

    Note: The constraints are internally applied but are not written to the Synopsis Design Constraint File

    (.sdc). To view the embedded-false path, type report_sdc in the console pane of the TimeQuest

    timing analyzer GUI.

    To my knowledge, the gray code of pointer in the fifo should use the set_max_delay to avoid the ptr delay exceed 1 clk cycle of fastest clk. It should not use the set_false_path constrain to the dcfifo

    I don't know why.

    When I generate the async fifo in the vivado of xilinx , I can find that the tool generate the set_max_delay constrain in the XDC file.

    I was confused
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yes I think you are confused.

    The pointers cross async clock domain and so a false path is a must otherwise timing will be reported as failed on these paths and will waste closure efforts.

    set max delay is a separate issue that you can choose to apply if it helps.

    If Xilinx does it automatically in their fifo I hope Altera will follow but I know they put registers close enough anyway by some internal invisible secrets.