Forum Discussion

Seadog's avatar
Seadog
Icon for Occasional Contributor rankOccasional Contributor
6 years ago

Timing error on Arria 10 PCIe core design

I have an Arria 10 design with a PCIe core (hard-IP, 8 lanes, gen 3, Avalon memory-mapped, w/ DMA). The only timing errors I am getting are related to the reset generated by the PCIe core. The reset is synchronized to the 250 MHz clock domain, and is used for both core-internal logic (DMA controller, etc) and attached external (user-generated) logic. Both set-up and recovery occur, with worst-case negative slack of more than 1 ns. Here is an example:

-1.119 qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_rst_sync.syncrstn_avmm_sriov.app_rstn_altpcie_reset_delay_sync_altpcie_a10_hip_hwtcl|sync_rst[0] qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_avmm_256_dma.avmm_256_dma.altpcieav_256_app|write_data_mover_2|dma_wr_wdalign|desc_lines_release_reg[8] qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT 4.000 -0.178 4.976 1

Initially I was seeing errors for external logic as well, but placing a register at the boundary of the core and retiming the reset before feeding it to the external logic eliminated those errors, so it now appears errors are limited to the PCIe core internals.

I put the sync_rst[] signals on globals, but this did not seem to help (maybe they were already on globals?).

Ideas?

Thanks.

19 Replies

  • Seadog's avatar
    Seadog
    Icon for Occasional Contributor rankOccasional Contributor

    OK, so I started doing this, and got about half way through the list (there are 27 .sdc files total) without seeing any improvement. So I took the extreme approach and removed all of the .sdc files except:

    • the project .sdc file
    • the three (temporary? not sure how this works) .sdc files which appear for a while during build under the qdb folder

    When I do this, the timing errors go away, but there are no timing checks at all for the clock domain which was causing the problem; in fact, approximately half of the clocks no longer appear in the clock report (with the .sdc files in, there are ~135 clocks; without the .sdc files, there are only about 62 clocks).

  • BoonT_Intel's avatar
    BoonT_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Sir,

    Sorry, I just learned from a member that familiar with timing, he said removing the SDC will only masking the issue but not removing the issue.

    He mentioned for IP, usually the removal/recovery violation for reset signal that come from external can be safely ignore. This is because the IP already synchronizing the reset.

  • Seadog's avatar
    Seadog
    Icon for Occasional Contributor rankOccasional Contributor

    I understand that removing .sdc files will remove constraints; but I went ahead with this approach in order to eliminate any possible conflicting constraints, based on your response of 3/4/20.

    Regarding ignoring the timing errors, I can not do that, because the errors are not confined to the reset signal; theare are also datapath errors, such as:

    -0.571

    qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_avmm_256_dma.avmm_256_dma.altpcieav_256_app|read_data_mover|avmmwr_burst_cntr[0]

    qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_avmm_256_dma.avmm_256_dma.altpcieav_256_app|read_data_mover|rd_status_fifo|fifo_reg[6][8]

    qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT

    qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT 4.000 -0.199 4.350 1

    And I don't understand what you are saying here:

    removal/recovery violation for reset signal that come from external can be safely ignore. This is because the IP already synchronizing the reset

    All of the timing violations I am seeing are within the IP core, and they are clock-clock violations which only occur (by definition) with synchronous signals.

  • BoonT_Intel's avatar
    BoonT_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Sir,

    I refer back to your original description, the failure is on reset recovery like below.

    -1.119 qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_rst_sync.syncrstn_avmm_sriov.app_rstn_altpcie_reset_delay_sync_altpcie_a10_hip_hwtcl|sync_rst[0] qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_avmm_256_dma.avmm_256_dma.altpcieav_256_app|write_data_mover_2|dma_wr_wdalign|desc_lines_release_reg[8] qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT qsys_design.synth_qys_inst.hchip_blob_inst|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|wys~CORE_CLK_OUT 4.000 -0.178 4.976 1

    but from what you reporting now, the violation is other path and no the recovery?

  • BoonT_Intel's avatar
    BoonT_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi @Seadog​ ,

    Refer to my earlier post, can you help to clarify which actual violation that you are facing in the design? I need an accurate information so that I can discuss with peers that working on timing field.

  • Seadog's avatar
    Seadog
    Icon for Occasional Contributor rankOccasional Contributor

    OK, I have new information.

    The module which carries the PCIe core in instantiated with a verilog generate structure, like this:

    parameter enable_blob = 1; // 1= yes, 0 = no

    generate

    begin: qsys_design

    if (enable_blob == 1'b1)

    begin: synth_qys_inst

    blob blob_inst (

    . . .

    )

    end

    endgenerate

    The PCIe core is instantiated within a wrapper which is instantiated in 'blob'. This version of the design will not make timing.

    So I built a simplified version of the design, which has the PCIe wrapper instantiated in the top level module; the only other things in that module are a PLL to generate system clocking, a reset control module, and a dummy 'terminator' to prevent the compiler from optimizing out the PCIe core. The simple design makes timing, and I am able to partition the critical portions of the DMA controller (which are:

    . . .|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_avmm_256_dma.avmm_256_dma.altpcieav_256_app

    . . .|pcie_wrapper_inst|u0|pcie_a10_hip_0|pcie_a10_hip_0|g_dmacontrol.dmacontrol.dma_control_0

    )

    and set the preservation level for those partitions to 'Final'. I was then able to hack in various other parts of the design (DDR4 controller, 10GE Ethernet MAC/PHY, bus bridges and various logic functions) and connect them in a more or less meaningful way, while maintaining the simplified hierarchy, which, and this seems to be important, does not include the Verilog generate structure. This design makes timing, with ~ 280ps of positive slack.

    So I started rebuilding my original design. I started with just the PLL, the reset control module, and the PCIe wrapper/core, but this time instantiated with the generate structure and the extra layer of hierarchy above the PCIe core. This does not make timing. But if I remove the generate structure, and otherwise leave the hierarchy unchanged, it does make timing.

    So to sum it up:

    top>generate:blob>pcie_wrapper>pcie_core - does not make timing

    top>blob>pcie_wrapper>pcie_core - does make timing

    top>pcie_wrapper>pcie_core - does make timing

    The difference in performance between good and bad results is about 1.2ns of slack for the 250MHz clock domain, and the errors are confined to the DMA portion of the PCIe core.

    I think I can replace the generate structure (which is there to allow simulation of the top-level design without the PCIe or DDR cores, which slow down sim and are not always needed) with simple conditional compile commands. So I think I have a solution, but I am still curious why the generate structure seems to be causing so much trouble.

  • BoonT_Intel's avatar
    BoonT_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi @Seadog

    Thanks for your update and glad to know that you are able to get the solution. I have no answer not for why generate structure causing the trouble, but I will feedback this observation to validation team and see if we can make improvement on this in future.

    Thanks

  • BoonT_Intel's avatar
    BoonT_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Welcome and thanks for your sharing as well.

    Hope everyone stay safe and healthy.💪