Forum Discussion

jyoung's avatar
jyoung
Icon for New Contributor rankNew Contributor
5 years ago
Solved

Issues with updating Arria10 PAC for AFU

Hello, Platform info: Arria 10 GX PAC Host System: Ubuntu 18.04 ( 4.15.0 kernel), Xeon Gold 6226R CPU dual-socket server We have two Arria10 PAC cards that we are trying to run the AFU Getting ...
  • jyoung's avatar
    jyoung
    5 years ago

    I was finally able to update this using super-rsu after completely shutting off power to the server (cold reboot):

    []$ super-rsu --log-level trace /usr/share/opae/a10-gx-pac/super-rsu/base/rsu-09c4.json
    [2020-12-28 16:33:37,086] [DEBUG   ] [MainThread  ] - found fpga objects: ['/sys/class/fpga/intel-fpga-dev.0']
    [2020-12-28 16:33:37,088] [DEBUG   ] [MainThread  ] - found device at 0000:3d:00.0 -tree is
     [pci_address(0000:3a:00.0), pci_id(0x8086, 0x2030)]
        [pci_address(0000:3b:00.0), pci_id(0x10b5, 0x8747)]
            [pci_address(0000:3c:08.0), pci_id(0x10b5, 0x8747)]
                [pci_address(0000:3d:00.0), pci_id(0x8086, 0x09c4)]
            [pci_address(0000:3c:10.0), pci_id(0x10b5, 0x8747)]
                [pci_address(0000:3e:00.0), pci_id(0x198a, 0x385c)]
    
    [2020-12-28 16:33:37,096] [WARNING ] [MainThread  ] - Update starting. Please do not interrupt.
    [2020-12-28 16:33:37,097] [DEBUG   ] [MainThread  ] - [3d:00.0] version (0x0124000200000367) up to date for sr
    [2020-12-28 16:33:37,098] [DEBUG   ] [MainThread  ] - bmc_fw is being force flashed
    [2020-12-28 16:33:37,098] [DEBUG   ] [MainThread  ] - bmc_fw versions not equal (system:0x0000000000026889 != manifest:0x0000000000026895)
    [2020-12-28 16:33:37,098] [DEBUG   ] [MainThread  ] - bmc_fw versions not equal (system:0x0000000000026889 != manifest:0x0000000000026895)
    [2020-12-28 16:33:37,099] [DEBUG   ] [MainThread  ] - [3d:00.0] update timeout set to: 1200.0
    [2020-12-28 16:33:37,099] [DEBUG   ] [3d:00.0     ] - update of board at [pci_address(0000:3d:00.0), pci_id(0x8086, 0x09c4)] started
    [2020-12-28 16:33:37,099] [DEBUG   ] [MainThread  ] - max timeout set to: 0:20:00
    [2020-12-28 16:33:37,100] [DEBUG   ] [3d:00.0     ] - starting task: fpgasupdate /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4_bootloader-26895-fw_Release.bin 0000:3d:00.0
    [2020-12-28 16:33:37,222] [WARNING ] Update starting. Please do not interrupt.
    [2020-12-28 16:33:37,223] [INFO    ] updating from file /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4_bootloader-26895-fw_Release.bin with size 38016
    [2020-12-28 16:33:37,331] [INFO    ] writing to staging area
    [2020-12-28 16:34:36,173] [DEBUG   ] [MainThread  ] - waiting (0:19:00.927721) for threads: 3d:00.0
    [2020-12-28 16:34:36,674] [DEBUG   ] [MainThread  ] - waiting (0:19:00.426487) for threads: 3d:00.0
    (100%) [____________________] [38016/38016 bytes][Time:0:01:34.404933]
    [2020-12-28 16:35:11,747] [INFO    ] applying update to 0000:3d:00.0
    (100%) [____________________][Time:0:00:08.010363]
    [2020-12-28 16:35:19,757] [INFO    ] update of 0000:3d:00.0 complete
    [2020-12-28 16:35:19,758] [INFO    ] Secure update OK
    [2020-12-28 16:35:19,758] [INFO    ] Total time: 0:01:42.536032
    [2020-12-28 16:35:19,809] [DEBUG   ] [3d:00.0     ] - task completed in 0:01:42.707920
    [2020-12-28 16:35:19,809] [DEBUG   ] [3d:00.0     ] - starting task: fpgasupdate /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4-26895-fw_Release.bin 0000:3d:00.0
    [2020-12-28 16:35:19,932] [WARNING ] Update starting. Please do not interrupt.
    [2020-12-28 16:35:19,934] [INFO    ] updating from file /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4-26895-fw_Release.bin with size 244864
    [2020-12-28 16:35:20,039] [INFO    ] writing to staging area
    (100%) [____________________] [244864/244864 bytes][Time:0:00:01.575939]
    [2020-12-28 16:35:21,626] [INFO    ] applying update to 0000:3d:00.0
    [2020-12-28 16:35:36,247] [DEBUG   ] [MainThread  ] - waiting (0:18:00.853465) for threads: 3d:00.0
    [2020-12-28 16:35:36,748] [DEBUG   ] [MainThread  ] - waiting (0:18:00.352268) for threads: 3d:00.0
    (100%) [____________________][Time:0:00:43.055355]
    [2020-12-28 16:36:04,681] [INFO    ] update of 0000:3d:00.0 complete
    [2020-12-28 16:36:04,682] [INFO    ] Secure update OK
    [2020-12-28 16:36:04,682] [INFO    ] Total time: 0:00:44.749368
    [2020-12-28 16:36:04,702] [DEBUG   ] [3d:00.0     ] - task completed in 0:00:44.892688
    [2020-12-28 16:36:05,283] [INFO    ] [MainThread  ] - 1 board updated. A power-cycle is required.
    [2020-12-28 16:36:05,284] [INFO    ] [MainThread  ] - super_rsu.pyc completed in: 0:02:28.187105
    [2020-12-28 16:36:05,284] [INFO    ] [MainThread  ] - super-rsu exiting with code '0'
    
    #Check the fme with fpgainfo to make sure it is updated
    []$ fpgainfo fme
    Board Management Controller, microcontroller FW version 26895
    Last Power Down Cause: POK_CORE
    Last Reset Cause: None
    //****** FME ******//
    Object Id                     : 0xEB00000
    PCIe s:b:d:f                  : 0000:3D:00:0
    Device Id                     : 0x09C4
    Socket Id                     : 0x00
    Ports Num                     : 01
    Bitstream Id                  : 0x124000200000367
    Bitstream Version             : 1.2.4
    Pr Interface Id               : 38d782e3-b612-5343-b934-2433e348ac4c
    Boot Page                     : user

    I'm not totally sure why the fpga-otsu command would not originally complete (and then finally completed), but my best guess is that using a slightly different kernel minor version or that restarting the server with cold reboot (powering off the server) helps to reinitialize the FPGA state and devices under /sys/. Note that warm reboots (normal power-cycle) may be causing some weirdness with the FPGA device initialization, which is why I've recommended cold reboots (turning off power completely for 20-30 seconds).

    For the original issue with super-rsu my suggested solution is the following:

    1) Follow the instructions to run fpga-otsu on pg. 40 of the AFU Quick Start Guide. If it fails, power the server off completely for ~30 seconds (cold reboot), power on, initialize the AFU devstack (`. /opt/inteldevstack/init_env.sh`) and rerun the command until it succeeds.

    2) Once fpga-otsu completes, perform a cold reboot again. This command should probably replace Step 2 on pg. 41 that originally suggests to "2. Power cycle the server." which can mean either a warm or cold reboot.

    3) Check that the ifpga_sec_mgr module is properly loaded correctly and that the ifpga_sec_mgr device exists. If it does not exist, try a cold reboot and check each time after initializing the AFU devstack.

    `ls /sys/class/fpga/intel-fpga-dev.0/intel-fpga-fme.0/ifpga_sec_mgr/
    ifpga_sec0`

    4) If this device exists, then the super-rsu command should complete successfully (or at least fail elsewhere).