Issues with updating Arria10 PAC for AFU
- 5 years ago
I was finally able to update this using super-rsu after completely shutting off power to the server (cold reboot):
[]$ super-rsu --log-level trace /usr/share/opae/a10-gx-pac/super-rsu/base/rsu-09c4.json [2020-12-28 16:33:37,086] [DEBUG ] [MainThread ] - found fpga objects: ['/sys/class/fpga/intel-fpga-dev.0'] [2020-12-28 16:33:37,088] [DEBUG ] [MainThread ] - found device at 0000:3d:00.0 -tree is [pci_address(0000:3a:00.0), pci_id(0x8086, 0x2030)] [pci_address(0000:3b:00.0), pci_id(0x10b5, 0x8747)] [pci_address(0000:3c:08.0), pci_id(0x10b5, 0x8747)] [pci_address(0000:3d:00.0), pci_id(0x8086, 0x09c4)] [pci_address(0000:3c:10.0), pci_id(0x10b5, 0x8747)] [pci_address(0000:3e:00.0), pci_id(0x198a, 0x385c)] [2020-12-28 16:33:37,096] [WARNING ] [MainThread ] - Update starting. Please do not interrupt. [2020-12-28 16:33:37,097] [DEBUG ] [MainThread ] - [3d:00.0] version (0x0124000200000367) up to date for sr [2020-12-28 16:33:37,098] [DEBUG ] [MainThread ] - bmc_fw is being force flashed [2020-12-28 16:33:37,098] [DEBUG ] [MainThread ] - bmc_fw versions not equal (system:0x0000000000026889 != manifest:0x0000000000026895) [2020-12-28 16:33:37,098] [DEBUG ] [MainThread ] - bmc_fw versions not equal (system:0x0000000000026889 != manifest:0x0000000000026895) [2020-12-28 16:33:37,099] [DEBUG ] [MainThread ] - [3d:00.0] update timeout set to: 1200.0 [2020-12-28 16:33:37,099] [DEBUG ] [3d:00.0 ] - update of board at [pci_address(0000:3d:00.0), pci_id(0x8086, 0x09c4)] started [2020-12-28 16:33:37,099] [DEBUG ] [MainThread ] - max timeout set to: 0:20:00 [2020-12-28 16:33:37,100] [DEBUG ] [3d:00.0 ] - starting task: fpgasupdate /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4_bootloader-26895-fw_Release.bin 0000:3d:00.0 [2020-12-28 16:33:37,222] [WARNING ] Update starting. Please do not interrupt. [2020-12-28 16:33:37,223] [INFO ] updating from file /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4_bootloader-26895-fw_Release.bin with size 38016 [2020-12-28 16:33:37,331] [INFO ] writing to staging area [2020-12-28 16:34:36,173] [DEBUG ] [MainThread ] - waiting (0:19:00.927721) for threads: 3d:00.0 [2020-12-28 16:34:36,674] [DEBUG ] [MainThread ] - waiting (0:19:00.426487) for threads: 3d:00.0 (100%) [____________________] [38016/38016 bytes][Time:0:01:34.404933] [2020-12-28 16:35:11,747] [INFO ] applying update to 0000:3d:00.0 (100%) [____________________][Time:0:00:08.010363] [2020-12-28 16:35:19,757] [INFO ] update of 0000:3d:00.0 complete [2020-12-28 16:35:19,758] [INFO ] Secure update OK [2020-12-28 16:35:19,758] [INFO ] Total time: 0:01:42.536032 [2020-12-28 16:35:19,809] [DEBUG ] [3d:00.0 ] - task completed in 0:01:42.707920 [2020-12-28 16:35:19,809] [DEBUG ] [3d:00.0 ] - starting task: fpgasupdate /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4-26895-fw_Release.bin 0000:3d:00.0 [2020-12-28 16:35:19,932] [WARNING ] Update starting. Please do not interrupt. [2020-12-28 16:35:19,934] [INFO ] updating from file /usr/share/opae/a10-gx-pac/super-rsu/base/a10sa4-26895-fw_Release.bin with size 244864 [2020-12-28 16:35:20,039] [INFO ] writing to staging area (100%) [____________________] [244864/244864 bytes][Time:0:00:01.575939] [2020-12-28 16:35:21,626] [INFO ] applying update to 0000:3d:00.0 [2020-12-28 16:35:36,247] [DEBUG ] [MainThread ] - waiting (0:18:00.853465) for threads: 3d:00.0 [2020-12-28 16:35:36,748] [DEBUG ] [MainThread ] - waiting (0:18:00.352268) for threads: 3d:00.0 (100%) [____________________][Time:0:00:43.055355] [2020-12-28 16:36:04,681] [INFO ] update of 0000:3d:00.0 complete [2020-12-28 16:36:04,682] [INFO ] Secure update OK [2020-12-28 16:36:04,682] [INFO ] Total time: 0:00:44.749368 [2020-12-28 16:36:04,702] [DEBUG ] [3d:00.0 ] - task completed in 0:00:44.892688 [2020-12-28 16:36:05,283] [INFO ] [MainThread ] - 1 board updated. A power-cycle is required. [2020-12-28 16:36:05,284] [INFO ] [MainThread ] - super_rsu.pyc completed in: 0:02:28.187105 [2020-12-28 16:36:05,284] [INFO ] [MainThread ] - super-rsu exiting with code '0' #Check the fme with fpgainfo to make sure it is updated []$ fpgainfo fme Board Management Controller, microcontroller FW version 26895 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** FME ******// Object Id : 0xEB00000 PCIe s:b:d:f : 0000:3D:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x124000200000367 Bitstream Version : 1.2.4 Pr Interface Id : 38d782e3-b612-5343-b934-2433e348ac4c Boot Page : userI'm not totally sure why the fpga-otsu command would not originally complete (and then finally completed), but my best guess is that using a slightly different kernel minor version or that restarting the server with cold reboot (powering off the server) helps to reinitialize the FPGA state and devices under /sys/. Note that warm reboots (normal power-cycle) may be causing some weirdness with the FPGA device initialization, which is why I've recommended cold reboots (turning off power completely for 20-30 seconds).
For the original issue with super-rsu my suggested solution is the following:
1) Follow the instructions to run fpga-otsu on pg. 40 of the AFU Quick Start Guide. If it fails, power the server off completely for ~30 seconds (cold reboot), power on, initialize the AFU devstack (`. /opt/inteldevstack/init_env.sh`) and rerun the command until it succeeds.
2) Once fpga-otsu completes, perform a cold reboot again. This command should probably replace Step 2 on pg. 41 that originally suggests to "2. Power cycle the server." which can mean either a warm or cold reboot.
3) Check that the ifpga_sec_mgr module is properly loaded correctly and that the ifpga_sec_mgr device exists. If it does not exist, try a cold reboot and check each time after initializing the AFU devstack.
`ls /sys/class/fpga/intel-fpga-dev.0/intel-fpga-fme.0/ifpga_sec_mgr/
ifpga_sec0`4) If this device exists, then the super-rsu command should complete successfully (or at least fail elsewhere).