Forum Discussion

New Contributor

1 month ago

Solved

Agilex 5 RSU Reboot without any Image

Dear all,

I'm currently in the process of configuring RSU on our system by following this tutorial: https://altera-fpga.github.io/rel-24.2/embedded-designs/agilex-5/e-series/premium/rsu/ug-rsu-agx5e-soc/.

I've added a factory partition. I've activated the Watchdog. When I don't service the watchdog, the watchdog triggers a cold reset and the system boots to the same application image again. After doing this 3 times, it's booting to the factory image. So this loooks okay.

Now I want to test a corrupted image. To this end, I erase all application images (using linux rsu_client --erase) and reboot. I expect that RSU boots to the fallback image. However, the system doesn't reboot at all and I don't see any output on the console after the shutdown.

If I now power-cycle the board it does boot to the fallback image.

I suspect the "reboot" command only triggers the "HPS warm reset", while the watchdog triggers the "HPS cold reset".

Is this expected behavior?

How can I configure the reboot to trigger the cold reset?

Attached you can find the log when rebooting and when applying a power cycle afterwards.

I use Quartus 25.1.1.

Kind Regards,

Eric Opitz

power_cycle_log.txt3 KB

reboot_log.txt2 KB

Hps Boot

RolandoS_Altera
1 month ago
Hello Eric
I think that the behavior that you are seeing is expected.
First, related to 'reboot' command in Linux, the selection of Cold of Warm reset depends on how you define the 'reboot' parameter in the Kenrel command line in U-Boot. We normally have something like this, in which we omitted the 'reboot' parameter, meaning that a 'cold' reset will be applied.
Kernel command line: console=ttyS0,115200 initrd=0x90000000 root=/dev/ram0 rw init=/sbin/init ramdisk_size=10000000 earlycon panic=-1 nosmp kvm-arm.mode=nvhe root=/dev/mmcblk0p2 rw rootwait
If we would like to apply a warm reset, then we will need to add reboot=warm to the command line, so I think you are actually applying a cold reset.
In the case of the watchdog timer, you can configure the action after it expires from the GHRD. In our examples we configure it to triger a RSU configuration.
Second thing to take in account. The 'cold' reset that we are talking about is only related to the HPS. This means that the HPS is being reset, including HPS memory and HPS OCRAM, but the SDM is not reset. The SDM is the one in charge of running the decision firmware, which is the one that checks the priority and integrity of the applications. So, when we applied a reboot + cold reset, the SDM firmware (not the decision firmware) will load the same FSBL for the current application selected (in this case the erased one) so nothing is going to be loaded and this is why we don't see any output in the serial console. In other hand, if we do a power cycle, everything is restarted, including the SDM, which will execute the decision firmware, and this will check the integrity of the application, and after it finds that this is corrupted, then it will switch to the factory image.
We have a command that allows you to tell the decision firmware to take action after the reboot command. The 'rsu_client --request <slot num>'. With this, you are telling the decision firmware to load an application from a slot. If this is corrupted, then the decision firmware will now check for the next priority application.
So, in order to achieve what you want to do you can try (assuming that the application you want to erase is in slot 0):
root@linux:~# ./rsu_client --erase 0
Operation completed
root@linux:~# ./rsu_client --enable 0
Operation completed
root@linux:~# ./rsu_client --request 0
Operation completed
root@linux:~# reboot
With this, you can observe that the application loaded was the factory image.

4 Replies

SueC_Altera
Contributor
1 month ago
[deleted]
RolandoS_Altera
Occasional Contributor
1 month ago
Hello Eric
I think that the behavior that you are seeing is expected.
First, related to 'reboot' command in Linux, the selection of Cold of Warm reset depends on how you define the 'reboot' parameter in the Kenrel command line in U-Boot. We normally have something like this, in which we omitted the 'reboot' parameter, meaning that a 'cold' reset will be applied.
Kernel command line: console=ttyS0,115200 initrd=0x90000000 root=/dev/ram0 rw init=/sbin/init ramdisk_size=10000000 earlycon panic=-1 nosmp kvm-arm.mode=nvhe root=/dev/mmcblk0p2 rw rootwait
If we would like to apply a warm reset, then we will need to add reboot=warm to the command line, so I think you are actually applying a cold reset.
In the case of the watchdog timer, you can configure the action after it expires from the GHRD. In our examples we configure it to triger a RSU configuration.
Second thing to take in account. The 'cold' reset that we are talking about is only related to the HPS. This means that the HPS is being reset, including HPS memory and HPS OCRAM, but the SDM is not reset. The SDM is the one in charge of running the decision firmware, which is the one that checks the priority and integrity of the applications. So, when we applied a reboot + cold reset, the SDM firmware (not the decision firmware) will load the same FSBL for the current application selected (in this case the erased one) so nothing is going to be loaded and this is why we don't see any output in the serial console. In other hand, if we do a power cycle, everything is restarted, including the SDM, which will execute the decision firmware, and this will check the integrity of the application, and after it finds that this is corrupted, then it will switch to the factory image.
We have a command that allows you to tell the decision firmware to take action after the reboot command. The 'rsu_client --request <slot num>'. With this, you are telling the decision firmware to load an application from a slot. If this is corrupted, then the decision firmware will now check for the next priority application.
So, in order to achieve what you want to do you can try (assuming that the application you want to erase is in slot 0):
root@linux:~# ./rsu_client --erase 0
Operation completed
root@linux:~# ./rsu_client --enable 0
Operation completed
root@linux:~# ./rsu_client --request 0
Operation completed
root@linux:~# reboot
With this, you can observe that the application loaded was the factory image.
- EricOpitz
  New Contributor
  1 month ago
  Hello Rolando,
  Thanks for the detailed response.
  Currently, if the active rsu image is corrupted in a production system our customers would have to perform a manual power cycle so that the card boots to another RSU/factory image.
  Do you have any suggestions how we can change our system so that the decision firmware is always active even during reboots?
  One idea would be to implement a linux reboot handler which asserts the nConfig reset pin via the FPGA.
  I also thought about calling "rsu_client --request" in the linux reboot handler. But this command requires the target slot as argument which may be set by another software component on our system. It would be great if there were a command to activate the decision firmware during reboot without argument or a command to read back the slot that was requested.
  Kind Regards,
  Eric
  - RolandoS_Altera
    Occasional Contributor
    1 month ago
    Hello Eric
    At this time, the only way to force decision firmware to act during a reboot is with the --request command.
    
    We could check with the SDM firmware team how feasible it would be to allow the decision firmware to act under the corruption of the current application. I think that at this time the assumption is that if you already got to Linux it means that it was not corrupted, but it's always possible that any Linux application corrupted the current image in the QSPI. It may take some time to get this implemented if this is accepted, so at this point, we need to rely on what we currently support.
    
    Something that perhaps could be done is, early when we start Linux (before the application gets corrupted), we can identify the slot that is being used for the current application and download a copy of this application (with --copy) and keep this file in the file system. Then, in the reboot handler, we can also retrieve the slot number and compare the current content with the downloaded version. If there is a mismatch, we can restore this using the --add option. In this way, we can guarantee that the current application is valid regardless of any other Linux application that had called before the --request option.
    Not sure if this could be feasible for your project. I will talk with some of my peers to check if there is another option to solve this problem. If so, I will let you know.
    Thanks
    Rolando

Forum Discussion

Agilex 5 RSU Reboot without any Image

4 Replies

Recent Discussions

Why does the system report an error when generating rbf from sof files and fsbl files?

Agilex5 HPS running bare-metal code does not access FPGA fabric

RTEMS for Agilex

Zephyr and FreeRTOS for Agilex7/9

Operating system kernel-level FPGA bridge communication