Forum Discussion

FabianL's avatar
FabianL
Icon for Occasional Contributor rankOccasional Contributor
1 month ago

Arria 10: Remote Update Factory Fallback won't work & Watchdog does not trigger

Hello,


I have to reopen another topic from last year:

Arria 10: Remote Update may brick FPGA and Factory Fallback won't work | Altera Community - 315011

Opposed to my  comments in the original thread, enabling the watchdog does not trigger a factory fallback if the application Image is wrongly aligned.

This brings me back to this scenario of the original post:

  1. Invalid application load image location, i.e. start of application load is shifted by1-10 Byte (Manually induced error scenario) --> The reprogramming sequence starts but never completes and no fallback to the factory load is performed. => The FPGA is completely unresponsive unless programmed via JTAG

It is obvious, that the this scenario might be an exotic error scenario, however we require a robust setup and have to make sure, that the FPGA remains accessible under any circumstances, so we need the Factory Fallback mechanism to work reliable!

 

We have this boot procedure:

    1. Boot into factory image (0x20 as boot address in flash boot sector 0x00 to 0x1F). We have certain HW which is sensible to boot up timing so we need this to guarantee an identical and reliable boot up procedure.
    2. Boot from factory load into application image
      1. Check for power up boot: Read RU_RECONFIG_TRIGGER_CONDITIONS register for power up state (0)
        • do not reconfigure if Bit 4,2,1,0 is set
      2. Set AnF bit: write "1" to RU_CONFIGURATION_MODE
      3. Set application image address RU_PAGE_SELECT
      4. Enable Watchdog Set RU_WATCHDOG_TIMEOUT & RU_WATCHDOG_ENABLE
      5. Reconfigure: write "1" to RU_RECONFIG
    3. In Application mode we only read the RU_RECONFIG_TRIGGER_CONDITIONS as status info
      • We do not write the RU_WATCHDOG_ENABLE nor RU_RESET_TIMER registers

I have run tests, with a Application Image being stored with an offset of -2 Bytes, i.e. the first 2 Bytes of the Application image are not stored in Flash Memory and the full image is shifted in its Flash storage. In this case, the FPGA gets stuck in an unresponsive state, when trying to load the application image.

There is no fallback to the factory load happening, no CRC error, no watchdog triggering.

As a best guess I could assume it might be related to this Note in 1.3.1. Remote System Configuration Mode that the factory fallback mechanism won't work for Arria 10 FPGAs if the last 576 Bytes of the bitstream are corrupted.

Note: The fallback to the factory image does not work under the following conditions: If the last 576 bytes of an unencrypted application image bitstream are corrupted. Intel recommends that you examine the last 576 bytes of the unencrypted application image before triggering the application image configuration.

But I have noticed that the binary images of the FPGA bitstream vary in size. So there is no way to check explicit memory locations for these 576 Bytes. Is there any way to identify this section?

My Questions:

  1. Why is the factory configuration fallback mechanism not working in the above described scenario? The Factory load image is valid!
  2. How can I examine/validate a FPGA bitstream in flash memory before executing it?

 

best regards

Fabian

 

9 Replies

  • Farabi's avatar
    Farabi
    Icon for Regular Contributor rankRegular Contributor

    Note: The fallback to the factory image does not work under the following conditions: If the last 576 bytes of an unencrypted application image bitstream are corrupted. Intel recommends that you examine the last 576 bytes of the unencrypted application image before triggering the application image configuration.

    But I have noticed that the binary images of the FPGA bitstream vary in size. So there is no way to check explicit memory locations for these 576 Bytes. Is there any way to identify this section?

    My Questions:

    1. Why is the factory configuration fallback mechanism not working in the above described scenario? The Factory load image is valid!
    2. How can I examine/validate a FPGA bitstream in flash memory before executing it. 

     

    Status: consulting engineering to check on factory fallback mechanism failure and how to confirm the memory location of this 576 bytes is corrupted or not. 

     

    regards,
    Farabi

  • Farabi's avatar
    Farabi
    Icon for Regular Contributor rankRegular Contributor

    Hello Fabian, 

     

    I checked with internal team, the size of the bitstreams varies, and it does not have a fixed size. 

    Notes: The configuration bitstream is always the last block interpreted by FPGA, regardless of total image size. 

    So the it is important to understand that the last 576 bytes is relative to the end of the image, not an absolute flash address. 

    This block is processed before the FPGA can even attempt a configuration. 

    It consists of : 

    1- Configuration end markers - signal end of bitstream

    2- CRC/Checksum data - to verify data integrity

    3- Device configuration info - to confirm compatibility

    4- RSU-related metadata - Required before fallback

     

    If corrupted: 

    1- FPGA doesn't know this image is failed

    2- FPGA only know this image is invalid

    3- Impact to - No fallback path is taken

     

    I am checking how to validate the bitstream before we can proceed with RSU. I will get back after getting the confirmed answer. 

     

    regards,
    Farabi

  • Farabi's avatar
    Farabi
    Icon for Regular Contributor rankRegular Contributor

    Hi Fabian, 

     

    1- Please dont do the 2-byte offset to trigger the CRC. You should delete some chunk of bitstream data and re-run to trigger the CRC.

    2- Can you compare the last 576-bytes of RPD file with your flash last 576-bytes? the contents MUST match if not this area might corrupt and possible the root cause of your fallback failure. 

     

    regards,
    Farabi

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Hello Farabi,

     

    1. I when data somewhere in between the bitstream the CRC Fallback mechanism works as expected. But that does not solve our problem when somethng goes wrong at the end of the bitstream.
    2. I checked the last 576 Byte of the RPD file and the flash contents. They actual do match.
      • The last 2664 Bytes of the RPD are 0xFF
      • Same applies for the Flash memory
    3. I also compared multiple RPD files (same FPGA Target 10AX027E3F29E2SG)
      • The end of the RPD always terminates with a sequence of 4 Byte 0x6A followed by multiple 0xFF Bytes.
      • The last 576 Bytes of the RPD always show 0xFF as content
      • The absolute number of 0xFF Bytes varies between 2664 and 1616 Bytes (could be other values as well, I only analyzed 8 different bitstreams.
      • ==> Giving that I have no glue how to validate the last 576 Bytes and do not know what it should be (except 0xFF)
    4. I just checked the newest Datasheet of the Remote Update IP core in 1.3.1. Remote System Configuration Mode d(Version 2024.07.25) has a new NOTE compared to Version 2022.08.16
      • If the first 1024 bytes and the last 576 bytes of an encrypted application imagebitstream are corrupted. Intel recommends that you examine the first 1024 bytesand the last 576 bytes of the encrypted application image before triggering theapplication image configuration.
      • This section does not make sense, as it is already included in the previous claim, that Fallback won't work if the last 576 Bytes are corrupted. Unless it is intended to be an OR combination, i.e. the Fallback won't work if the first 1024 or the last 576 Bytes are corrupted.
      • Could you please clarify this!

     

    For being able to validate the first 1024 & last 576 Byte of the Bitstream we have to know how the should look like.

    Thanks for your assistance.

     

    best regards

    Fabian

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      "I when data somewhere in between the bitstream the CRC Fallback mechanism works as expected. But that does not solve our problem when somethng goes wrong at the end of the bitstream."

      [ANS] This is known issue where it is the limitation of RSU IP. 

       

      regards,

      Farabi

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Hello Farabi,

     

    I understand that this is a limitation in the HW of the RSU.  However we still require a Workaround to at least be able to validate the vulnerable parts of the bitstream  before we trigger the reconfiguration from our factory load.

     

    Currently we have the situation that a corrupted bitstream may brick our system, resulting in an unresponsive system which must  be sent back to service center for manual repair by flashing a new image via JTAG.

    The reliable Fallback mechanism for In-System Update Capabilities was an essential point for the decision for Aria 10 devices.

    Now we end up, realizing that the Fallback mechanism does not work reliable. Sorry but we require at least a possibility to check an image before executing a reconfiguration. Unless we get any information about the structure of the first 1024 & last 576 Byte of the bitstream this is not possible. Ending up with a completely unresponsive system ist not an option!

     

    Thanks for any further advice. 

     

    best regards

    Fabian

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Hello Farabi,

     

    I'm sorry but I must insist, that we require the information about how to check the vulnerable parts of the Bitstream (first 1024 and last 576 Byte). Otherwise there is no way to deal with the risk of a bricked FPGA in case of update errors. 

    Could you please provide Information about the structure of these parts of the bitstream. 

     

    Thanks for your assistance.

    Best regards

    Fabian

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      I am escalating this to engineering. As far as I know the bitstream structure is confidential information. But let me check again with them. 

       

      regards,

      Farabi

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      I have confirmed with engineering. The bitstream structure is confidential information. 

       

      regards,

      Farabi