rsu_client failing to write to slot

Hello,

I am trying to exercise the rsu_client (from Intel's remote system update feature) by erasing a partition on the flash and writing a new file and loading that on the next reboot.

This feature works but very very occasionally I encounter an issue where the writing portion fails and the only way that I know to recover from this is to rewrite the flash with the JIC file.

I am wondering if someone can advise on how/why this could happen?

The feature works robustly most of the time but the said error would require a manual intervention by connecting the JTAG cable.

also is it possible to recover from this using the existing rsu_client?

I have attached some of the output of the rsu_client for your reference.

I do not see any specific message when running `dmesg` on HPS or by inspecting the log in u-boot related to the SPTs/CPBs or QSPI read failure.

Note that I do not think this is related to the Flash being worn-out from 1000s of write cycles, the Flash is new and I am seeing this issue on multiple different boards.

root@stratix10:~# rsu_client --log
      VERSION: 0x00000202
        STATE: 0x00000000
CURRENT IMAGE: 0x0000000001000000
   FAIL IMAGE: 0x0000000000000000
    ERROR LOC: 0x00000000
ERROR DETAILS: 0x00000000
RETRY COUNTER: 0x00000000
Operation completed

root@stratix10:~# rsu_client --list 0
      NAME: P1
    OFFSET: 0x0000000001000000
      SIZE: 0x01000000
  PRIORITY: 1
Operation completed

root@stratix10:~# rsu_client --list 1
      NAME: P2
    OFFSET: 0x0000000002000000
      SIZE: 0x01000000
  PRIORITY: [disabled]
Operation completed

root@stratix10:~# rsu_client --list 2
      NAME: P3
    OFFSET: 0x0000000003000000
      SIZE: 0x01000000
  PRIORITY: [disabled]
Operation completed

root@stratix10:~# rsu_client -y
DCMF0: OK
DCMF1: OK
DCMF2: OK
DCMF3: OK
Operation completed
root@stratix10:~# rsu_client -m
DCMF0 version = 23.1.0
DCMF1 version = 23.1.0
DCMF2 version = 23.1.0
DCMF3 version = 23.1.0
Operation completed

root@stratix10:~# rsu_client --erase 1
Operation completed

root@stratix10:~#rsu_client --add application.hps.rpd --slot 1
librsu: priority_add(): Compressing CPB [MED]
librsu: erase_dev(): error: Erase length 32768 not erase block aligned [LOW]
librsu: writeback_cpb(): error: Unable to ease CPBx [LOW]
ERROR: Failed to enable slot

Thank you!

embedded peripheral

general

EricOpitz
13 days ago
Dear all,
I had the same issue. Here is my solution for future reference.

I can reproduce this using the following script:
for i in {1..2000}; do
echo "Iteration $i"
rsu_client --erase 0 || break
rsu_client --add update.rpd --slot 0 || break
done

Using this script the problem occurs every 508th iteration on my system. Once in the error state, I can only get out of it by flashing the .jic file. Running the commands in U-Boot did not work for me.

In iteration 1-507 the rsu_client performs a "write" to update the CPB. In the 508th iteration it performs an additional "erase" before the "write". This is why it occurs so infrequently.

For me the fix was to increase the size of the SPT and CPB partitions to the erase size of my QSPI flash (64KiB) in the programming file generator file (.pfg).

Kind Regards,
Eric Opitz

8 Replies

EricOpitz
Occasional Contributor
13 days ago
Dear all,
I had the same issue. Here is my solution for future reference.

I can reproduce this using the following script:
for i in {1..2000}; do
echo "Iteration $i"
rsu_client --erase 0 || break
rsu_client --add update.rpd --slot 0 || break
done

Using this script the problem occurs every 508th iteration on my system. Once in the error state, I can only get out of it by flashing the .jic file. Running the commands in U-Boot did not work for me.

In iteration 1-507 the rsu_client performs a "write" to update the CPB. In the 508th iteration it performs an additional "erase" before the "write". This is why it occurs so infrequently.

For me the fix was to increase the size of the SPT and CPB partitions to the erase size of my QSPI flash (64KiB) in the programming file generator file (.pfg).

Kind Regards,
Eric Opitz
- BoonBengT_Altera
  Moderator
  10 days ago
  Hi EricOpitz,
  
  Thanks for your commitment on sharing the solution toward this issues.
  
  We believe this will be great help to others in the future which has the similar issues.
  
  Best regards,
  Altera Technical Support
aikeu
Regular Contributor
1 year ago
Hi Scotty2,

May I know you will require to overwrite the jic file which is the one with the factory backup only can avoid the intermittent update issue?
If you perform the rsu in Uboot user space using the Uboot rsu cmd will the error still occur?

Thanks.
Regards,
Aik Eu
aikeu
Regular Contributor
1 year ago
Hi Scotty2,

Any follow up from the previous comment?

Thanks.
Regards,
Aik Eu
Scotty2
Occasional Contributor
1 year ago
Hello Aik!

sorry for the late reply, for some reason I did not get an email update...

Can you kindly clarify your first question? Are you asking if the JIC file that I am using to rewrite the entire flash is the same JIC file that is already on it?

if so, yes, The JIC file is configured to have 3 partitions ( and the backup image). Writing the JIC file will get the QSPI and RSU client out of this error stage. Currently, the only way that I know to recover from this state is to use the original JIC file.
regarding the second question, I have to get back to you next week, I will run the test in u-boot as that requires a serial connection. I will try to erase partition 1 in u-boot and add the .rpd file in u-boot use space.
From Linux space, I have tried requesting the Factory image to be loaded once the board is in this erroneous stage and that operation succeeded without any problems. It is the writing of the new image to a partition that can cause this issue.
thank you for your support!

Scotty2

Occasional Contributor

1 year ago

Hello,

I have ran the commands in u-boot and can confirm that it executes without an issue.

I also suspect It also fixes the intermittent Linux RSU client issue, because I can run the rsu_client commands in Linux again and they do not fail anymore.

While this is great to know for getting the boards at out the erroneous state without JTAG, it still requires a serial cable and that is not always possible. I would prefer to fix the cause of the intermittent problem!
Any ideas why the Linux library/driver fails?

Here is the log of the u-boot commands:
I requested the log from each partition, erased slot1, loaded the rpd file from sd-card into memory, used rsu to program that, verified it and rebooted the board. After the power cycle, it shows that the board is booted with slot1 (in stead of slot0)

SOCFPGA # rsu status_log
SPTs are GOOD!!!
CPBs are GOOD!!!
Current Image   : 0x01000000
Last Fail Image : 0x00000000
State           : 0x00000000
Version         : 0x00000202
Error location  : 0x00000000
Error details   : 0x00000000
Retry counter   : 0x00000000

SOCFPGA # rsu display_dcmf_version
SPTs are GOOD!!!
CPBs are GOOD!!!
DCMF0 version = 23.1.0
DCMF1 version = 23.1.0
DCMF2 version = 23.1.0
DCMF3 version = 23.1.0

SOCFPGA #  rsu slot_count
SPTs are GOOD!!!
CPBs are GOOD!!!
Number of slots = 3.

SOCFPGA # rsu slot_get_info 0
SPTs are GOOD!!!
CPBs are GOOD!!!
NAME: P1
OFFSET: 0x0000000001000000
SIZE: 0x01000000
PRIORITY: 1

SOCFPGA # rsu slot_get_info 1
SPTs are GOOD!!!
CPBs are GOOD!!!
NAME: P2
OFFSET: 0x0000000002000000
SIZE: 0x01000000
PRIORITY: [disabled]

SOCFPGA # rsu slot_get_info 2
SPTs are GOOD!!!
CPBs are GOOD!!!
NAME: P3
OFFSET: 0x0000000003000000
SIZE: 0x01000000
PRIORITY: [disabled]

SOCFPGA # rsu slot_erase 1
SPTs are GOOD!!!
CPBs are GOOD!!!
CPBs are GOOD!!!
Slot 1 erased.

SOCFPGA # load mmc 0:3 $loadaddr /kernel_backup/app.hps.rpd
462848 bytes read in 23 ms (19.2 MiB/s)

SOCFPGA # rsu slot_program_buf 1 $loadaddr $filesize
SPTs are GOOD!!!
CPBs are GOOD!!!
CPBs are GOOD!!!
Slot 1 was programmed with buffer=0x0000000001000000 size=462848.

SOCFPGA # rsu slot_verify_buf 1 $loadaddr $filesize
SPTs are GOOD!!!
CPBs are GOOD!!!
Slot 1 was verified with buffer=0x0000000001000000 size=462848.

# Power cycled the board

SOCFPGA # rsu slot_get_info 1
SPTs are GOOD!!!
CPBs are GOOD!!!
NAME: P2
OFFSET: 0x0000000002000000
SIZE: 0x01000000
PRIORITY: 1

SOCFPGA # rsu status_log
SPTs are GOOD!!!
CPBs are GOOD!!!
Current Image   : 0x02000000
Last Fail Image : 0x00000000
State           : 0x00000000
Version         : 0x00000202
Error location  : 0x00000000
Error details   : 0x00000000
Retry counter   : 0x00000000

Thank you

aikeu
Regular Contributor
1 year ago
Hi Scotty2,

Great to hear that the issue can be resolved using the Uboot user space rsu command. I not able to find similar issue related linux kernel rsu failure and I mostly use the Uboot rsu command from my experience.
I suspect probably some qspi flash operation is blocking the rsu client operation so make sure the qspi flash has been fully erased before adding the .jic file with the factory image.

Thanks.
Regards,
Aik Eu
aikeu
Regular Contributor
1 year ago
Hi Scotty2,

As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

Thanks.
Regards,
Aik Eu

Forum Discussion

rsu_client failing to write to slot

8 Replies

Recent Discussions

Timings eMMC

Agilex 5 premium board - es version - boots with gibberish prompts

Fatal error in Module tennm_noc_fabric_adaptor in file .../sim_lib/tennm_agilex7_io96_ncrypt.sv

Agilex 5E - PCIE PERST# pin - failing compilation

Release 26.1 PRO