Forum Discussion

FabianL's avatar
FabianL
Icon for Occasional Contributor rankOccasional Contributor
2 months ago
Solved

Arria 10: Remote Update Watchdog unpredicted behavior

Hello,

 

I start a new thread here, since the previous thread is not answered anymore after the transformation from Intel to Altera Forum. All details and data is still valid from the original thread:

Arria 10: Remote Update may brick FPGA and Factory Fallback won't work | Altera Community

Main problem is:

  1. We have to scenarios (see also here)
    1. Misaligned Image:
      1. Enable Watchdog in Factory Image
      2. trigger reconfiguration (write 1 to RU_RECONFIGURATION_MODE & RU_RECONFIG)
      3. Reconfiguration fails due to misaligned image --> Watchdog triggers
      4. Fallback to factory mode
      5. ==> This case is working as expected. Good Case!
    2. Aligned valid Image
      1. Enable Watchdog in Factory Image
      2. trigger reconfiguration (write 1 to RU_RECONFIGURATION_MODE & RU_RECONFIG)
      3. Application Image starts. Application Image does not serve or actively disable the watchdog!
      4. Since the application image does not serve the watchdog, I would expect a factory fallback due to watchdog triggering. NOTE: We do not talk about further reconfiguration triggered from within application image. We only do reconfiguration from within the factory load.
      5. ==> This is not happening. And I don't understand why. Or is the watchdog automatically disabled once a valid application image is loaded?
    3. Critical Questions about the Watchdog timeout register:
      1. What is the unit of the watchdog timeout register? This is not specified in its datasheet/documentation.
      2. Farabi stated "Please make sure the watchdog timeout not too. eg. Dont set RU_WATCHDOG_TIMEOUT = 0xFFF (this is too long)". Why is this too long? I am missing any restrictions in the respective datasheet.

Please advice. Thanks for any help

best regards

Fabian

  • Hello Fabian, 

     

    Thanks for the reply. 

     

    Functional error in that context is when the logic that is responsible to reset the WD timer fails to do its task, which will initiate a fallback mechanism. The third paragraph actually correlates with the first paragraph. Factory fallback needs to meet certain conditions when in application configuration for it to happen, so expecting the fallback mechanism to occur with everything working correctly is a wrong understanding. 

     

    regards,

    Farabi

14 Replies

  • Farabi's avatar
    Farabi
    Icon for Regular Contributor rankRegular Contributor

    Hello Fabian, 

     

    I am sorry, I missed this case from my dashboard. 

    I will continue support asap. 

     

    regards,

    Farabi

  • Farabi's avatar
    Farabi
    Icon for Regular Contributor rankRegular Contributor

    Hello Fabian, 

     

    Things to check : 

    1- There is possibility watchdog not truly enabled after reconfiguration, please make sure you perform below steps 

    a) RU_WATCHDOG_TIMEOUT

    b) RU_WATCHDOG_ENABLE = 1

    before you assert RU_RECONFIGURATION_MODE and RU_RECONFIG.

     

    2- Can you try to set the factory start address at fixed address = 0x0000_0020 and see if its can improve the fallback mechanism? 

     

    If you confirm step #1 anf #2 and make sure application makes no writes to RSU watchdog register, or any RSU control registers until your app is ready to keep the image, the device should revert to factory image after timeout. 

     

    regards,
    Farabi

     

     

     

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Hello,

     

    Thanks for the reply. Factory Image Location is fixed to  0x0000_0020 for all tests.

    We are executing the register access in this order:

    1. RU_RECONFIGURATION_MODE  = 1
    2. RU_WATCHDOG_TIMEOUT = 0xFFF
    3. RU_WATCHDOG_ENABLE = 1
    4. RU_RECONFIG = 1

    Does this make any difference if RU_RECONFIGURATION_MODE is set before the watchdog is enabled?

     

    As mentioned in the main topic. The watchdog is working, if the target application image is invalid/unaliged. In that case the fallback to factory mode is performed, which is not the case if I do not enable the watchdog in the factory image.

     

    best regards.

    Fabian

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello, 

       

      If your application image is valid and working. The system will load the application image, no fallback to factory image will happen. 

      Do you expect they system to fallback to factory image when your application image is successfully loaded? 

       

      regards,

      Farabi

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Hello Farabi,

     

    actually yes. From the datasheet section, I would interpret this in the way, that the watchdog needs to be triggered or disabled by the application image and serves as a functional validation:

    Additionally, the remote update mode features an optional user watchdog timer that can detect functional errors in an application configuration.

    It seems to be even more clear from section 1.3.2. Remote System Configuration Components

    Arria® 10 and Cyclone® 10 GX devices are equipped with a built-in watchdog timer for remote system configuration to prevent a faulty application configuration from indefinitely stalling the device.

    If the application configuration does not reset the user watchdog timer before time expires, the dedicated circuitry reconfigures the device with the factory configuration and resets the user watchdog timer.

    But maybe I have a wrong understanding of these datasheet sections. 

    So you state, once the application image is successfully loaded the watchdog will be disabled and won't trigger any fallback anymore? 

    So it only covers the image loading procedure?

     

    best regards

    Fabian

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      If your Application Image is not corrupted, the system will stay with Application Image and not fallback to factory image. Why you want your system to load factory image after a successful Application image load? 

       

      regards,

      Farabi

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      Can you response to my last question? 

       

      regards,

      Farabi

      • FabianL's avatar
        FabianL
        Icon for Occasional Contributor rankOccasional Contributor

        Hello Farabi,

         

        I'm very sorry for my very late reply. As mentioned, that is what I have read from the official Intel/Altera datasheet. Please read my last post with the relevant citations.

         

        the datasheet explicitly mentions that the "application configuration" resets the watchdog. ==> Hence I expect that the application load needs to reset the watchdog.

        It also explicitly mentions "functional errors in an application configuration". If it's designed to cover functional errors and not only format errors (as for a corrupted application load) it requires interaction with the application configuration to allow functional verification. e.g. it could be triggered only after the configuaration has checked that is functional with all peripherals.

         

        Anyway that is the Intel/Altera datasheet which makes this claim. If it is not the case, please clarify what the intention of these datasheet sections is.

        Additionally, the remote update mode features an optional user watchdog timer that can detect functional errors in an application configuration.

        Arria® 10 and Cyclone® 10 GX devices are equipped with a built-in watchdog timer for remote system configuration to prevent a faulty application configuration from indefinitely stalling the device.

        If the application configuration does not reset the user watchdog timer before time expires, the dedicated circuitry reconfigures the device with the factory configuration and resets the user watchdog timer.

  • FabianL's avatar
    FabianL
    Icon for Occasional Contributor rankOccasional Contributor

    Thanks for the clarification. So this is clearly the expected behavior, that is fine for our application.

    I'm still missing an answer to the Watchdog Timeout:

    1. What is the unit of the watchdog timeout register? This is not specified in its datasheet/documentation.
    2. You stated "Please make sure the watchdog timeout not too. eg. Dont set RU_WATCHDOG_TIMEOUT = 0xFFF (this is too long)". Why is this too long? I am missing any restrictions in the respective datasheet.

     

    best regards

    Fabian

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      A unit of the watchdog timer is 1/f, and f is the frequency of internal oscillator. 

       

      Based on datasheet, by right it should be 1/7.9MHz for a unit time (nominal calculation).  

       

      regards,

      Farabi

       

    • Farabi's avatar
      Farabi
      Icon for Regular Contributor rankRegular Contributor

      Hello Fabian, 

       

      Do you have further question? 

       

      regards,

      Farabi

      • FabianL's avatar
        FabianL
        Icon for Occasional Contributor rankOccasional Contributor

        Hello Farabi,

        actually yes, as pointed out in my last post I'm missing answers concerning the  Watchdog Timeout:

         

        1. What is the unit of the watchdog timeout register? This is not specified in its datasheet/documentation.
        2. You stated "Please make sure the watchdog timeout not too. eg. Dont set RU_WATCHDOG_TIMEOUT = 0xFFF (this is too long)". Why is this too long? I am missing any restrictions in the respective datasheet.

         

        best regards

        Fabian