Forum Discussion

EricOpitz's avatar
EricOpitz
Icon for New Contributor rankNew Contributor
1 month ago
Solved

Agilex 5 RSU Reboot without any Image

Dear all,

I'm currently in the process of configuring RSU on our system by following this tutorial: https://altera-fpga.github.io/rel-24.2/embedded-designs/agilex-5/e-series/premium/rsu/ug-rsu-agx5e-soc/.

I've added a factory partition. I've activated the Watchdog. When I don't service the watchdog, the watchdog triggers a cold reset and the system boots to the same application image again. After doing this 3 times, it's booting to the factory image. So this loooks okay.

Now I want to test a corrupted image. To this end, I erase all application images (using linux rsu_client --erase) and reboot. I expect that RSU boots to the fallback image. However, the system doesn't reboot at all and I don't see any output on the console after the shutdown.

If I now power-cycle the board it does boot to the fallback image.

I suspect the "reboot" command only triggers the "HPS warm reset", while the watchdog triggers the "HPS cold reset".

Is this expected behavior?

How can I configure the reboot to trigger the cold reset?

Attached you can find the log when rebooting and when applying a power cycle afterwards.

I use Quartus 25.1.1.

Kind Regards,

Eric Opitz

  • Hello Eric

    I think that the behavior that you are seeing is expected.

    First, related to 'reboot' command in Linux, the selection of Cold of Warm reset depends on how you define the 'reboot' parameter in the Kenrel command line in U-Boot. We normally have something like this, in which we omitted the 'reboot' parameter, meaning that a 'cold' reset will be applied.

    Kernel command line: console=ttyS0,115200 initrd=0x90000000 root=/dev/ram0 rw init=/sbin/init ramdisk_size=10000000 earlycon panic=-1 nosmp kvm-arm.mode=nvhe root=/dev/mmcblk0p2 rw rootwait

    If we would like to apply a warm reset, then we will need to add reboot=warm to the command line, so I think you are actually applying a cold reset.

    In the case of the watchdog timer, you can configure the action after it expires from the GHRD. In our examples we configure it to triger a RSU configuration.

    Second thing to take in account. The 'cold' reset that we are talking about is only related to the HPS. This means that the HPS is being reset, including HPS memory and HPS OCRAM, but the SDM is not reset. The SDM is the one in charge of running the decision firmware, which is the one that checks the priority and integrity of the applications. So, when we applied a reboot + cold reset, the SDM firmware (not the decision firmware) will load the same FSBL for the current application selected (in this case the erased one) so nothing is going to be loaded and this is why we don't see any output in the serial console. In other hand, if we do a power cycle, everything is restarted, including the SDM, which will execute the decision firmware, and this will check the integrity of the application, and after it finds that this is corrupted, then it will switch to the factory image.

    We have a command that allows you to tell the decision firmware to take action after the reboot command. The 'rsu_client --request <slot num>'. With this, you are telling the decision firmware to load an application from a slot. If this is corrupted, then the decision firmware will now check for the next priority application. 

    So, in order to achieve what you want to do you can try (assuming that the application you want to erase is in slot 0):

    root@linux:~# ./rsu_client --erase 0

    Operation completed

    root@linux:~# ./rsu_client --enable 0

    Operation completed

    root@linux:~# ./rsu_client --request 0

    Operation completed

    root@linux:~# reboot

    With this, you can observe that the application loaded was the factory image.

     

4 Replies

  • RolandoS_Altera's avatar
    RolandoS_Altera
    Icon for Occasional Contributor rankOccasional Contributor

    Hello Eric

    I think that the behavior that you are seeing is expected.

    First, related to 'reboot' command in Linux, the selection of Cold of Warm reset depends on how you define the 'reboot' parameter in the Kenrel command line in U-Boot. We normally have something like this, in which we omitted the 'reboot' parameter, meaning that a 'cold' reset will be applied.

    Kernel command line: console=ttyS0,115200 initrd=0x90000000 root=/dev/ram0 rw init=/sbin/init ramdisk_size=10000000 earlycon panic=-1 nosmp kvm-arm.mode=nvhe root=/dev/mmcblk0p2 rw rootwait

    If we would like to apply a warm reset, then we will need to add reboot=warm to the command line, so I think you are actually applying a cold reset.

    In the case of the watchdog timer, you can configure the action after it expires from the GHRD. In our examples we configure it to triger a RSU configuration.

    Second thing to take in account. The 'cold' reset that we are talking about is only related to the HPS. This means that the HPS is being reset, including HPS memory and HPS OCRAM, but the SDM is not reset. The SDM is the one in charge of running the decision firmware, which is the one that checks the priority and integrity of the applications. So, when we applied a reboot + cold reset, the SDM firmware (not the decision firmware) will load the same FSBL for the current application selected (in this case the erased one) so nothing is going to be loaded and this is why we don't see any output in the serial console. In other hand, if we do a power cycle, everything is restarted, including the SDM, which will execute the decision firmware, and this will check the integrity of the application, and after it finds that this is corrupted, then it will switch to the factory image.

    We have a command that allows you to tell the decision firmware to take action after the reboot command. The 'rsu_client --request <slot num>'. With this, you are telling the decision firmware to load an application from a slot. If this is corrupted, then the decision firmware will now check for the next priority application. 

    So, in order to achieve what you want to do you can try (assuming that the application you want to erase is in slot 0):

    root@linux:~# ./rsu_client --erase 0

    Operation completed

    root@linux:~# ./rsu_client --enable 0

    Operation completed

    root@linux:~# ./rsu_client --request 0

    Operation completed

    root@linux:~# reboot

    With this, you can observe that the application loaded was the factory image.

     

    • EricOpitz's avatar
      EricOpitz
      Icon for New Contributor rankNew Contributor

      Hello Rolando,

      Thanks for the detailed response.

      Currently, if the active rsu image is corrupted in a production system our customers would have to perform a manual power cycle so that the card boots to another RSU/factory image.

      Do you have any suggestions how we can change our system so that the decision firmware is always active even during reboots?

      One idea would be to implement a linux reboot handler which asserts the nConfig reset pin via the FPGA.

      I also thought about calling "rsu_client --request" in the linux reboot handler. But this command requires the target slot as argument which may be set by another software component on our system. It would be great if there were a command to activate the decision firmware during reboot without argument or a command to read back the slot that was requested.

      Kind Regards,

      Eric

      • RolandoS_Altera's avatar
        RolandoS_Altera
        Icon for Occasional Contributor rankOccasional Contributor

        Hello Eric

        At this time, the only way to force decision firmware to act during a reboot is with the --request command.

         

        We could check with the SDM firmware team how feasible it would be to allow the decision firmware to act under the corruption of the current application. I think that at this time the assumption is that if you already got to Linux it means that it was not corrupted, but it's always possible that any Linux application corrupted the current image in the QSPI. It may take some time to get this implemented if this is accepted, so at this point, we need to rely on what we currently support.

         

        Something that perhaps could be done is, early when we start Linux (before the application gets corrupted), we can identify the slot that is being used for the current application and download a copy of this application (with --copy) and keep this file in the file system. Then, in the reboot handler, we can also retrieve the slot number and compare the current content with the downloaded version. If there is a mismatch, we can restore this using the --add option. In this way, we can guarantee that the current application is valid regardless of any other Linux application that had called before the --request option.

        Not sure if this could be feasible for your project. I will talk with some of my peers to check if there is another option to solve this problem. If so, I will let you know.

        Thanks

        Rolando