Forum Discussion

nskim's avatar
nskim
Icon for New Contributor rankNew Contributor
2 months ago

Agilex 7 R-Tile RBES FPGA – CXL Device Enumeration Failure with CXL IP Design Example

1. What is the failure symptom? Please elaborate on the failure symptoms in detail.
The CXL device fails to enumerate when using the CXL Type-3 IP design example.
•    lspci -vvv | grep 0ddb does not detect the CXL device
•    numactl -H does not report a CXL NUMA node
The issue persists across multiple system reboots and bitstream rebuilds. A factory reset was attempted but did not resolve the issue.


2. When did the failure happen? When did you buy the part, and when did you receive it?
The device failed at some point around October 2025. 


3. How did you discover the failure? Please describe it in detail.
We found OS failed to find the CXL device and confirmed the issue after factory recovery.  


4. In which part of your process did you find the issue (Lab, production, quality, etc.)?
Lab environment.4.1 Was the device already in the field? How many times has it been used?
No. The device has only been used in a controlled lab environment for bring-up and testing.


5. How many units failed and how many units were used/tested by you? Which is the production code?
•    Failed units: 1
•    Units tested: Multiple Agilex FPGA boards
•    Production code: Not available
Only this unit exhibits the failure.


6. How did you determine the failure? Please elaborate on the procedures.
Multiple bring-up attempts were conducted using known-good hardware, software, and bitstreams.
•    6.1 Internal Debug: No internal physical failure analysis was performed.
•    6.2 Device Swap: Yes. Replacing the board with a known-good FPGA resolves the issue.


7. Was the failing unit ever working before the failure?
Yes. The device was functioning correctly before the failure.


8. How did you rule out electrical overstress (EOS) or electrostatic discharge (ESD)?
There is no visible physical damage on the FPGA or PCB.
The board has been handled according to standard ESD-safe lab procedures.


9. What are your expectations from this failure analysis?
Identify the root cause of the failure and restore proper CXL IP functionality, or provide a replacement device.


10. Have you re-balled your device? If yes, was it lead-free reballing?
No. The device has not been re-balled, and no third-party rework was performed.


11. Please add pictures of the device from the top and the bottom. See attached. 

12. Is there any other relevant information that could assist in the failure analysis?
No additional information at this time.


13. Are there any known changes to the process, materials, or design that could have contributed to the failure?
No.

13 Replies

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,

     

    May I know which board OPN are you using? Can you also share the board serial number?

     

    1. Have you check what is the status of the CXL Phy Link on the FPGA? 

    2. Has your server been tested to be working with CXL card before? 

    3. Can you share with me your server information such server model, CPU use and RAM?

     

    Thanks

    • nskim's avatar
      nskim
      Icon for New Contributor rankNew Contributor

      Hi John,

      The board OPN is DK-DEV-AGI027RBES (Power Solution 2), and the serial number is AGIPCIe8020349.

      Regarding your questions:

      1. CXL PHY link status on the FPGA

      Yes, we have checked the CXL PHY link status on the FPGA, and it was functional.

      2. Previous validation of the server with a CXL card

      Yes, this server has been tested with the same CXL card before and was working correctly.

      3. Server information (model, CPU, and RAM)

      We tested on both SPR and EMR platforms. For the EMR platform, we are using a Supermicro SYS-221H-TNR with two Intel® Xeon® Gold 6538Y+ CPUs and four Samsung M321R8GA0BB0-CQKRH DDR5 DIMMs.

      • JohnT_Altera's avatar
        JohnT_Altera
        Icon for Regular Contributor rankRegular Contributor

        Hi,

         

        2. Previous validation of the server with a CXL card

        Yes, this server has been tested with the same CXL card before and was working correctly.

        [JohnT] May I know which design was use to test it was working correctly? What is the changes performed?

         

        Thanks