christian_kamps_fidus
New Contributor
1 month agoPCIe Enumeration Failure for CXL IP
When attempting to validate the Agilex 7 R-Tile Compute Express Link (CXL) 1.1/2.0 IP (Type 2 and Type 3) using a CXL compatible host server, the host server is unable to complete PCIe bus enumeration. The host server stalls while attempting to complete PCIe bus enumeration. The stall never resolves after boot, and access to to the host is never granted. Depiction of the stall and its status code from the host server's perspective is provided as an attached PNG file titled: "pcie_enumeration_stall".
Debugging Information:
- A PCIe Gen 5.0 reference design using the Altera R-Tile Avalon Streaming IP For PCI Express was used to validate that PCIe enumeration could complete fully without failure, and that the host server could exchange data with the FPGA.
- While running the CXL example design, the Quartus System Console's Link Logger indicates that the LTSSM state is in the "UP_L0" before the PCIe bus enumeration stall. The state may sometimes change when attempting to "Refresh" the status during the PCIe bus enumeration stall. The state may briefly enter recovery (UP_L0 -> REC_IDLE -> REC_RCVRCFG -> REC_RCSVLOCK -> REC_COMPLETE -> UP_L0). Depiction of the Quartus System Console's Link Logger when this occurs is provided as an attached PNG file titled: "ltssm_link_logger".
- While running the CXL example design, the Quartus System Console's Link Logger indicates that the advertised and negotiated link speeds and widths are both 32.0 GT and x16. Depiction of a CXL Type 3 Quartus System Console's Overview is provided as an attached PNG file titled: "cxl_ip_systemconsole_overview".
- Instead of generating the example design, the pre-compiled binary offered by Altera for Type 2 and Type 3 CXL IP designs was used and resulted in the exact same failures as shown above.
- CXL.mem transaction registers (M2S and S2M) are 0x00, indicating that the host server never progresses far enough to begin sending/receiving transactions/requests.
- Between the PCIe build that functions and the CXL build that does not function (stalls at enumeration), no server UEFI settings were changed. A CXL enable function was enabled for all tests.
- Several PCIe UEFI settings were changed in an attempt to resolve the enumeration stall, but none changed the outcome.
- Attempting to disable the CXL Compliance 2.0 and the HDM decoder registers both did not resolve the issue.
- The FPGA was powered and programmed before the server was launched.
- Two different CXL servers were tested and both resulted in the same behavior.
- The relevant PCIe and CXL settings from BIOS is provided as an attached PNG file titled: "cxl_server_settings".
- The CXL REFCLK was tested as both COMMON and SRIS/SRNS. Each test changed SW3 to use relevant onboard and connected based clocks.
IP Settings:
- CXL IP settings are uploaded as PNG files titled: "cxl_ip_settings_N".
- The settings tested are the default provided settings as well as a version with a 300 MHz PLD clock (SRIS).
Hardware Details:
- FPGA is connected to host server via PCIe Gen 5.0 x16 slot on Tile 14C.
- FPGA device is the Altera Agilex 7 FPGA I-Series Development Kit (Production 2x R-Tile & 1x F-Tile) (AGIB027R29A1E1VB)
- The DIMM provided with the development kit is slotted into DIMM Slot A.
- SW1 is set to 1000 (PCIe PRSNT x16).
- SW3 is set to 0110 for designs using the CXL/PCIe common clock and 0000 for designs using the CXL/PCIe onboard REFCLK (SRIS).
Software Details:
- Quartus Prime Pro Edition v25.1 was used to generate the designs.
- R-Tile Altera FPGA IP for Compute Express Link (CXL) was generated with version 1.17.0.
FPGA Design:
- The FPGA design is generated using the example design with the IP settings given above.
- A pre-compiled binary provided by Altera was also used to test instead of a generated design.
Server details:
- SMC AS-1126HS-TN (CXL 2.0 via 4x PCIe gen5 x16 slots)
- CPU: 2x AMD EPYC 9135 (CXL 2.0)
- RAM: 4x Micron 64GB @ 6000 MT/s
- UEFI: AMI 1.7a 10/30/2025
Attachments:
- The system console debug register outputs are saved to CSV files attached to this post. These CSV files are taken from a CXL Type 3 reference design with PLD REFCLK at 300 MHz (SRIS).
Questions:
- Can you provide guidance on how to obtain more information on the enumeration status other than the LTSSM register?
- Can you provide the UEFI/BIOS settings for PCIe/CXL that was used to test this IP as reference?
- Could the configuration space registers (DVSEC/HDM) or the TLP handling implemented in the CXL example design RTL create this PCIe enumeration failure?
- Can you provide guidance on what debug/status registers the CXL IP provides that could be relevant to this issue?