Forum Discussion

nskim's avatar
nskim
Icon for New Contributor rankNew Contributor
1 month ago

Agilex 7 R-Tile CXL Type-2 IP Hang with Incomplete CXL.cache Operations

Device: DK-DEV-AGI027RBES (Power Solution 2)

Software Version: Quartus Prime Pro 24.3

IP Core: CXL Type 2 Hard IP

Issue Description: 

We observed that the CXL Type-2 IP can hang when CXL.cache transactions remain incomplete under heavy workloads. Our design is based on the CXL Type-2 design example, with a delay unit inserted between the CXL IP and the DDR controller.

We ran Intel MLC bandwidth tests on the CXL device while monitoring host reachability using continuous ping. In the first experiment (Figure 1), the delay unit inserted a 10,000-cycle delay for each CXL.mem request, with no CXL.cache operations involved. In this case, ping latency increased from approximately 0.2 ms to 45 ms, but the system remained stable.

In the second experiment (Figure 2), we replaced the 10,000-cycle delay with a CXL.cache operation, which typically completes in around 300 cycles. Under this configuration, the system hung and ping indicated that the host became unreachable. We observed that the CXL.cache request was issued but never received a response, leading to the hang.

We would like to know if there is a known issue or recommended solution for handling incomplete CXL.cache operations in this scenario.

2 Replies

  • Hi,

    So far I don't see a known issue directly related to this. For such situation, I suggest you to check Channel Crediting section in the CXL spec and monitor corresponding credit count in the design. 

     

    Regards,

    Rong

     

    • yangz's avatar
      yangz
      Icon for New Contributor rankNew Contributor

      Hi Rong,

      I checked the CXL credit usage through the Debug Access Memory Map and monitored the following registers:

      • 0x01001100: DCOH CXL Status Register
      • 0x01001108: DCOH Error Status Register
      • 0x00051410: CXL Link Layer Rx Credit Control Register
      • 0x00051418: CXL Link Layer Rx Credit Return Status Register
      • 0x00051420: CXL Link Layer Tx Credit Status Register
      • 0x00051D08: Credits Available

      When the system crash occurs, I observed that mem_r_avail (number of available CXL.mem request credits) drops from 64 to 0, indicating that the CXL.mem request credits are fully consumed. However, I did not observe any other abnormal behavior related to stalled CXL.cache operations.

      Are there any additional debug registers you would recommend checking to better understand the status of the CXL.cache interface?

      Thanks,
      Yang