Forum Discussion

Dang_Tran__Frederic's avatar
Dang_Tran__Frederic
Icon for New Contributor rankNew Contributor
1 year ago

N6000/PL-1 SmartNIC image deployment error

Hello,

I’ve installed an Intel N6000/1-PL SmartNIC on a Lenovo SR650v2 server with the following stack:

  • N6000 SKU1
  • CentOS Stream release 8
  • OPAE v2.1.1
  • kernel 5.15.92-dfl

Server BIOS settings: card tested on two slots (1 and 7) with PCIe bifurcation set to x8x8. Fan speed set to maximum.

The server BIOS reports the following warning:

PCIe error recovery has occurred in slot number 1. The adapter may not work correctly.

And dmesg contains:

[22638.864360] intel-m10bmc-sec-update n6000bmc-sec-update.3.auto: SDM trigger failure: 4
[22638.877250] dfl-pci 0000:c5:00.1: enabling device (0140 -> 0142)
[22638.877568] dfl-pci 0000:c5:00.1: PCIE AER unavailable -5.
[22638.890287] dfl-pci 0000:c5:00.2: enabling device (0140 -> 0142)
[22638.890607] dfl-pci 0000:c5:00.2: PCIE AER unavailable -5.
[22638.904091] dfl-pci 0000:c5:00.3: enabling device (0140 -> 0142)
[22638.904377] dfl-pci 0000:c5:00.3: PCIE AER unavailable -5.
[22638.916944] dfl-pci 0000:c5:00.4: enabling device (0140 -> 0142)
[22638.917231] dfl-pci 0000:c5:00.4: PCIE AER unavailable -5.

Trying to deploy an image results in the error included below.
Otherwise PCIe inventory and fpgainfo command seem to work ok as shown below.

Any help would be appreciated. Hardware problem, on-card BMC problem, software problem ?

fpgasupdate --log-level debug ofs_top_page1_pacsign_user1.bin 0000:C5:00.0
[2024-01-29 05:07:27.46] [DEBUG ] fw file: ofs_top_page1_pacsign_user1.bin
[2024-01-29 05:07:27.46] [DEBUG ] addr: 0000:C5:00.0
[2024-01-29 05:07:27.46] [DEBUG ] hash256: b'e026976389252b8a746943f351e8f149e5f0415f620cd1e0618229eb79e01bb8'
[2024-01-29 05:07:27.46] [DEBUG ] hash384: b'bb04ea12557ce23f2cb75685669d794fb6a06bf7b590430aa8bfdb4c765c6e15ecdb38200e1599aa8a7e52a2958e20db'
[2024-01-29 05:07:27.46] [DEBUG ] file type: Static Region (Update)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.3 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.1 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.0 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.4 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.2 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.0 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)

[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl*.*/*spi*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl*.*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/spi*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.4/n6000bmc-sec-update.3.auto/*fpga_sec_mgr*/*fpga_sec*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.4/n6000bmc-sec-update.3.auto/fpga_image_load/fpga_image*"
Traceback (most recent call last):
File "/usr/bin/fpgasupdate", line 33, in <module>
sys.exit(load_entry_point('opae.admin===1.4.1-', 'console_scripts', 'fpgasupdate')())
File "/usr/lib/python3.6/site-packages/opae/admin/tools/fpgasupdate.py", line 789, in main
if pac.upload_dev.find_one(os.path.join('update', 'filename')):
AttributeError: 'NoneType' object has no attribute 'find_one'

lspci -vt

| +-02.0-[c3-c4]--+-00.0 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.1 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.2 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.3 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.4 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.5 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.6 Intel Corporation Ethernet Controller E810-C for backplane
| | \-00.7 Intel Corporation Ethernet Controller E810-C for backplane
| \-04.0-[c5]--+-00.0 Intel Corporation Device bcce
| +-00.1 Intel Corporation Device bcce
| +-00.2 Intel Corporation Device bcce
| +-00.3 Intel Corporation Device bcce
| \-00.4 Intel Corporation Device bcce


fpgainfo fme
Intel Acceleration Development Platform N6001
Board Management Controller NIOS FW version: 3.14.0
Board Management Controller Build version: 3.14.0
//****** FME ******//
Object Id : 0xEF00000
PCIe s:b:d.f : 0000:C5:00.0
Vendor Id : 0x8086
Device Id : 0xBCCE
SubVendor Id : 0x8086
SubDevice Id : 0x1771
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x5010202FAB46E6A
Bitstream Version : 5.0.1
Pr Interface Id : 00bc56cf-9e1f-5bf0-8011-48736ec862c9
Boot Page : user1
Factory Image Info : 801148736ec862c900bc56cf9e1f5bf0
User1 Image Info : 801148736ec862c900bc56cf9e1f5bf0
User2 Image Info : 801148736ec862c900bc56cf9e1f5bf0

12 Replies

  • KianHinT_altera's avatar
    KianHinT_altera
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Frederic,

    Sorry for the delay in replying to your post. Just a few questions

    1) Does the fpga card working (eg running afu test or your program) after your see the error and done all the commands that you listed (lspci, fpgainfo fme)?

    2) Did you do any prior flashing on the FPGA card before rebooting?

    3) Does the issue happen 1 time only or every reboot also you see the issue "PCIe error recovery has occurred in slot number 1. The adapter may not work correctly."

    It might be due to this intel-m10bmc-sec-update n6000bmc-sec-update.3.auto: SDM trigger failure: 4

    If flashing SDM firmware , what I saw in our engineering database is that :

    SDM provision firmware downloading requires Power Cycle, (This is SDM requirement).

    Once SDM provisioning firmware download and key provisioning is done then we need to do power cycle.

    Thanks

    Regards

    Kian

  • Hi Kian,

    1) chicken and egg problem: since I cannot deploy any image on the board, I haven't be able to test it with any program (my end goal is to use Intel P4 SDK with this card)

    2) the only thing that I flashed on the card is a more recent BMC firmware (using a USB/jtag cable). The initial version was 3.1. I upgraded it to 3.14 but to no avail:

    Board Management Controller NIOS FW version: 3.14.0
    Board Management Controller Build version: 3.14.0

    3) the problem occurs systematically after any number of (cold) reboot

    I'm not aware of the SDM firmware. Is it distinct from the BMC firmware ?


    Regards

    Frederic

    • KianHinT_altera's avatar
      KianHinT_altera
      Icon for Frequent Contributor rankFrequent Contributor

      Hi Frederic ,

      Thanks for the reply, so basically the fpga board does not have any image in it yet other than the BMC firmware on max10.

      I was trying to find which version is associated with Pr Interface Id : 00bc56cf-9e1f-5bf0-8011-48736ec862c9

      Anyway, I discuss with my colleague over here on this issue, we should focus on why fpgasupdate fail with missing files. I were thinking because the card is non functional without valid image , it is triggering the SDM (secure device manager) to try reconfigure the fpga and fail. It is a separate firmware from BMC but have some interface with it.

      Do you know the OFS version that you installed in your system, I only saw OPAE is 2.1.1 but dfl version unknown except you are running kernel 5.15.92) and also the Quartus version that is installed in your system?

      Could you try using Quartus to program/flash the fpga and see whether the fpga is working?

      Thanks

      Regards

      Kian

  • Hi Kian,

    Regarding OFS version, I did not use an OFS installer script. I compiled the kernel using this branch of the linux-dfl project:

    git clone https://github.com/OPAE/linux-dfl.git -b fpga-ofs-dev-5.15-lts

    Quartus version is Version 22.1.0 Build 174 03/30/2022 SC Pro Edition.

    My knowledge of Quartus (and low-level FPGA programming) being limited, I'm afraid I won't be able to program the card using Quartus unless a ready-to-use project is available.

    Regards,

    Frederic

    • KianHinT_altera's avatar
      KianHinT_altera
      Icon for Frequent Contributor rankFrequent Contributor

      Hi Frederic,

      Sorry for the delay in replying, trying to setup a server on my end to test out the configuration on my side.

      Do you mind to provide the file that you tried to flash in via this command "fpgasupdate --log-level debug ofs_top_page1_pacsign_user1.bin 0000:C5:00.0" ?

      I will try it on my end and see whether I could see the same thing

      Thanks

      Regards

      Kian