Forum Discussion

christian_kamps_fidus's avatar
christian_kamps_fidus
Icon for New Contributor rankNew Contributor
1 month ago

PCIe Enumeration Failure for CXL IP

When attempting to validate the Agilex 7 R-Tile Compute Express Link (CXL) 1.1/2.0 IP (Type 2 and Type 3) using a CXL compatible host server, the host server is unable to complete PCIe bus enumeration. The host server stalls while attempting to complete PCIe bus enumeration. The stall never resolves after boot, and access to to the host is never granted. Depiction of the stall and its status code from the host server's perspective is provided as an attached PNG file titled: "pcie_enumeration_stall".

Debugging Information:

  • A PCIe Gen 5.0 reference design using the Altera R-Tile Avalon Streaming IP For PCI Express was used to validate that PCIe enumeration could complete fully without failure, and that the host server could exchange data with the FPGA.
  • While running the CXL example design, the Quartus System Console's Link Logger indicates that the LTSSM state is in the "UP_L0"  before the PCIe bus enumeration stall. The state may sometimes change when attempting to "Refresh" the status during the PCIe bus enumeration stall. The state may briefly enter recovery (UP_L0 -> REC_IDLE -> REC_RCVRCFG -> REC_RCSVLOCK -> REC_COMPLETE -> UP_L0). Depiction of the Quartus System Console's Link Logger when this occurs is provided as an attached PNG file titled: "ltssm_link_logger".
  • While running the CXL example design, the Quartus System Console's Link Logger indicates that the advertised and negotiated link speeds and widths are both 32.0 GT and x16. Depiction of a CXL Type 3 Quartus System Console's Overview is provided as an attached PNG file titled: "cxl_ip_systemconsole_overview".
  • Instead of generating the example design, the pre-compiled binary offered by Altera for Type 2 and Type 3 CXL IP designs was used and resulted in the exact same failures as shown above.
  • CXL.mem transaction registers (M2S and S2M) are 0x00, indicating that the host server never progresses far enough to begin sending/receiving transactions/requests.
  • Between the PCIe build that functions and the CXL build that does not function (stalls at enumeration), no server UEFI settings were changed. A CXL enable function was enabled for all tests.
  • Several PCIe UEFI settings were changed in an attempt to resolve the enumeration stall, but none changed the outcome.
  • Attempting to disable the CXL Compliance 2.0 and the HDM decoder registers both did not resolve the issue.
  • The FPGA was powered and programmed before the server was launched.
  • Two different CXL servers were tested and both resulted in the same behavior.
  • The relevant PCIe and CXL settings from BIOS is provided as an attached PNG file titled: "cxl_server_settings".
  • The CXL REFCLK was tested as both COMMON and SRIS/SRNS. Each test changed SW3 to use relevant onboard and connected based clocks.

IP Settings:

  • CXL IP settings are uploaded as PNG files titled: "cxl_ip_settings_N".
  • The settings tested are the default provided settings as well as a version with a 300 MHz PLD clock (SRIS).

Hardware Details:

  • FPGA is connected to host server via PCIe Gen 5.0 x16 slot on Tile 14C.
  • FPGA device is the Altera Agilex 7 FPGA I-Series Development Kit (Production 2x R-Tile & 1x F-Tile)  (AGIB027R29A1E1VB)
  • The DIMM provided with the development kit is slotted into DIMM Slot A.
  • SW1 is set to 1000 (PCIe PRSNT x16).
  • SW3 is set to 0110 for designs using the CXL/PCIe common clock and 0000 for designs using the CXL/PCIe onboard REFCLK (SRIS).

Software Details:

  • Quartus Prime Pro Edition v25.1 was used to generate the designs.
  • R-Tile Altera FPGA IP for Compute Express Link (CXL) was generated with version 1.17.0.

FPGA Design:

  • The FPGA design is generated using the example design with the IP settings given above.
  • A pre-compiled binary provided by Altera was also used to test instead of a generated design.

Server details:

  • SMC AS-1126HS-TN (CXL 2.0 via 4x PCIe gen5 x16 slots)
  • CPU: 2x AMD EPYC 9135 (CXL 2.0)
  • RAM: 4x Micron 64GB @ 6000 MT/s
  • UEFI: AMI 1.7a 10/30/2025

Attachments:

  • The system console debug register outputs are saved to CSV files attached to this post. These CSV files are taken from a CXL Type 3 reference design with PLD REFCLK at 300 MHz (SRIS).

 Questions:

  • Can you provide guidance on how to obtain more information on the enumeration status other than the LTSSM register?
  • Can you provide the UEFI/BIOS settings for PCIe/CXL that was used to test this IP as reference?
  • Could the configuration space registers (DVSEC/HDM) or the TLP handling implemented in the CXL example design RTL create this PCIe enumeration failure? 
  • Can you provide guidance on what debug/status registers the CXL IP provides that could be relevant to this issue?

3 Replies

  • Hi Christian,

     

    Thanks for providing these detailed info which is helpful.

     

    If possible, please follow UG section "6.1. Recommended Hardware Setup" using an Intel platform for the test. 

     

    Follow section 6.3 to use pof to boot FPGA. This step is necessary for CXL.

     

     

    Regards,

    Rong

    • christian_kamps_fidus's avatar
      christian_kamps_fidus
      Icon for New Contributor rankNew Contributor

      If possible, please follow UG section "6.1. Recommended Hardware Setup" using an Intel platform for the test.

      Unfortunately, we do not have an Intel platform. Was the CXL IP ever tested with non-Intel platforms?

      Follow section 6.3 to use pof to boot FPGA. This step is necessary for CXL.

      We have programmed the board to the non-volatile memory successfully using both a POF and JIC file. The files tested were the pre-compiled binaries for the SOF/POF, and the JIC was generated from the pre-compiled SOF binary. Additionally, we compiled our own example designs to generate SOF, POF, and JIC. Configuration was verified for each test for both volatile and non-volatile memory. All tests failed in the same manner as described above.

      Is there a specific reason why the CXL IP requires the configuration image be located in the flash? Our system boots the FPGA board fully before the host server is booted, so there is no timing contention.

      Can you provide any additional support on what could be causing enumeration failure or steps that can be used to debug the cause?

  • The user debug memory provides more insight to the enumeration status. Registers 0x020E0104 to 0x020E0144 in the user debug memory map provide PF0 AER. The captured register values are as given below:


    PF0 AER Uncorrectable Error Status : 0x00000000

    PF0 AER Uncorrectable Error Mask : 0x00000000

    PF0 AER Uncorrectable Error Severity : 0x00062010

    PF0 AER Correctable Error Status : 0x00000001

    PF0 AER Correctable Error Mask : 0x00002000

    PF0 AER Advanced Error Capabilities and Control Register : 0x00000200

    PF0 AER Header Log 0 : 0x04000001

    PF0 AER Header Log 1 : 0xE0002103

    PF0 AER Header Log 2 : 0xEF010048

    PF0 AER Header Log 3 : 0x00000000

    PF0 AER First TLP Prefix Log Register : 0xDEADBEEF

    PF0 AER Second TLP Prefix Log Register : 0xDEADBEEF

    PF0 AER Third TLP Prefix Log Register : 0xDEADBEEF

    PF0 AER Fourth TLP Prefix Log Register : 0x00000011

    PF0 AER Correctable Error Status (0x020E011C) shows an assertion of the "Receiver Error Status". Any idea of what could be causing this?