Forum Discussion

JChen576's avatar
JChen576
Icon for New Contributor rankNew Contributor
5 years ago

Stratix10 SX Soc Development Kit boot failed randomly

Hi All,

We bought a Stratix10 SX Soc Development Kit (Model Name: DEV KIT DKSOC1SSXLA) recently and got an extra HPS HILO DDR Card this week.

We plug the DDR Card onto the evaluation board.

Follow the instructions of the "QUICK START GUIDE" in the box.

However, we found the KIT is extremely unstable.

Sometimes it can boot into Linux system, sometimes it blocks at booting or shows error message (randomly ).

The flash image is the original image as default in the box.

Read Intel's documents and compare the settings on the board, but don't see any clue.

Did anyone have this issue before? How can I solve this problem?

Thanks

These are some error messages while system booting:

* Case one:

===================================================================

U-Boot SPL 2017.09 (Sep 22 2018 - 07:29:05)

MPU 1000000 kHz

L3 main 400000 kHz

Main VCO 2000000 kHz

Per VCO 2000000 kHz

EOSC1 25000 kHz

HPS MMC 50000 kHz

UART 100000 kHz

DDR: Initializing Hard Memory Controller

DDR: Calibration success

SDRAM: Initializing ECC 0x00000000 - 0x80000000

SDRAM-ECC: Initialized success with 1343 ms

DDR: HMC init success

DDR: 2048 MiB

DDR: Running SDRAM size sanity check

DDR: SDRAM size check passed!

QSPI: Reference clock at 400000000 Hz

Trying to boot from MMC1

"Synchronous Abort" handler, esr 0x96000210

ELR: ffe08efc

LR: ffe08e04

x 0: 0000000000000000 x 1: 0000000000018404

x 2: 00000000a0000037 x 3: 0000000000000015

x 4: 000000003fa00320 x 5: 000000000000006c

x 6: 00000000ffe12f63 x 7: 0000000000000003

x 8: 0000000000000230 x 9: 0000000000000080

x10: 00000000ffe3dbec x11: 00000000ffe12a10

x12: 0000000000000176 x13: 0000000000000454

x14: 00000000ffe3dd6c x15: 00000000ffe12a10

x16: 0000000000030a0f x17: f63f9fa5faadbfed

x18: 00000000ffe3de90 x19: 000000003fa00800

x20: 0000000000000000 x21: 00000000ffe3dd00

x22: 00000000ffe3dbe0 x23: 00000000000007bd

x24: 0000000000000029 x25: 0000000000000001

x26: 0000000080020000 x27: 122c43677d8ff77f

x28: df6e6e611f01fbfd x29: 00000000ffe3dc20

Resetting CPU ...

===================================================================

* Case 2

===================================================================

U-Boot SPL 2017.09 (Sep 22 2018 - 07:29:05)

MPU 1000000 kHz

L3 main 400000 kHz

Main VCO 2000000 kHz

Per VCO 2000000 kHz

EOSC1 25000 kHz

HPS MMC 50000 kHz

UART 100000 kHz

DDR: Initializing Hard Memory Controller

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Error as SDRAM calibration failed

DDR: Initialization failed.

### ERROR ### Please RESET the board ###

===================================================================

8 Replies

  • Hi,

    Thanks for the information, I noticed that you are using the old version of you Uboot. May I also know which Quartus version you are working on?

    We recommend that you use the latest supported version of Uboot below with guidance on booting directly using the prebuilt GHRD image.

    https://rocketboards.org/foswiki/Documentation/S10GSRDBootLinuxFromSDCard180

    Prebuilt image

    https://rocketboards.org/foswiki/Documentation/GSRDTagging

    Creating the boot loader

    https://rocketboards.org/foswiki/Documentation/BuildingBootloader

    • JChen576's avatar
      JChen576
      Icon for New Contributor rankNew Contributor

      Thanks for your reply. @EberL_Intel

      I only have one Stratix 10 SX dev board. I follow the guide and still get the error.

      The sof I try to boot with JTAG is https://releases.rocketboards.org/release/2019.04/gsrd/s10_gsrd/ghrd_1sx280lu2f50e2vg_hps.sof and the same problem happens.

      There is no original HPS HILO DDR4 card in our kit box when it comes to us.

      I believe the reason is https://www.intel.com/content/dam/altera-www/global/en_US/support/boards-kits/stratix10/dcl-ddr4-hilo-notice.pdf .

      So we got the separated HPS HILO DDR4 card with our local vendor's help and installed it on the mainboard.

      The HPS HILO DDR4 card is "MEM MODULE HILDCDDR44GA DDR4 HiLo Daughter Card". The label on the card is "ALTERA DDR4 X72 DAUGHTER CARD".

      We are trying to contact the local vendor for another HPS HILO DDR4 card. But it needs weeks to get a new one.

      At the same time, we run Board Test System.

      We see BTS shows many "Detected Errors" in DDR tab.

      Besides, the results of "FMCA", "FMCB".... are not all exactly the same as the user guide (Intel Stratix 10 SX soc Development Kit User Guide).

      For example, in the FMCB tab, it shows "PLL lock: Partially Locked" in our BTS test.

      Our BTS test is following the instructions in "5. Board Test System ".

      One of our question is the result of BTS can be relied on or not.

      If the BTS is correct, does it mean we should change the dev kit with a pre-installed HPS HiLo DDR4 card from our local vendor?

      Regards,

  • VWoll's avatar
    VWoll
    Icon for New Contributor rankNew Contributor

    Hi,

    We are also having the exact same issue with our production boards. In some cases, our board does not boot.

    This occurs on our own design, but we are observing the exact same problems as above. More specifically our device is:

    FPGA: 1SX280HU3F50E2VG

    We are compiling this design using Quartus Prime Version 19.4.0 Build 64 12/04/2019 SC Pro Edition

    This error does not always appear to occur - if you leave it on for long enough, eventually it starts to work. But not always.

    Some additional observations:

    1) When this occurs, JTAG programming is also likely to fail. We do not know why.

    2) The likelyhood of this happening increases if we spray the board with flux remover.

    We get the following error messages:

    U-Boot SPL 2017.09-00187-g70eb145123 (Apr 06 2020 - 19:11:55)
    MPU         1000000 kHz
    L3 main     400000 kHz
    Main VCO    2000000 kHz
    Per VCO     2000000 kHz
    EOSC1       125000 kHz
    HPS MMC     50000 kHz
    UART        100000 kHz
    DDR: Initializing Hard Memory Controller
    DDR: Triggerring emif_reset
    DDR: emif_reset triggered successly
    DDR: Triggerring emif_reset
    DDR: emif_reset triggered successly
    DDR: Triggerring emif_reset
    DDR: emif_reset triggered successly
    DDR: Error as SDRAM calibration failed
    DDR: Initialization failed.
    ### ERROR ### Please RESET the board ###

    Or, we get this:

    U-Boot SPL 2017.09-00187-g70eb145123 (Apr 06 2020 - 19:11:55)
    MPU         1000000 kHz
    L3 main     400000 kHz
    Main VCO    2000000 kHz
    Per VCO     2000000 kHz
    EOSC1       125000 kHz
    HPS MMC     50000 kHz
    UART        100000 kHz
    DDR: Initializing Hard Memory Controller
    DDR: Calibration success
    SDRAM: Initializing ECC 0x00000000 - 0x80000000
    SDRAM-ECC: Initialized success with 1357 ms
    DDR: HMC init success
    DDR: 2048 MiB
    DDR: Running SDRAM size sanity check
    DDR: SDRAM size check passed!
    QSPI: Reference clock at 400000000 Hz
    Trying to boot from MMC1
    "Synchronous Abort" handler, esr 0x96000210
    ELR:     ffe08fb0
    LR:      ffe09034
    x 0: 0000000000000001 x 1: 00000000000003e8
    x 2: 0000000000000020 x 3: 0000000000000015
    x 4: 00000000ffe3d640 x 5: 0000000000000001
    x 6: 0000000000000040 x 7: 00000000ffe3db00
    x 8: 0000000000000200 x 9: 0000000000000080
    x10: 0000000080000010 x11: 0000000080000014
    x12: 0000000000000176 x13: 0000000000000454
    x14: 00000000ffe3dd6c x15: 00000000ffe12a60
    x16: 0000000000030a10 x17: e3fb9eb93f97afdf
    x18: 00000000ffe3de90 x19: 000000003fa00800
    x20: 00000000ffe3d7a8 x21: 0000000000000010
    x22: 00000000ffe3db00 x23: 0000000000000004
    x24: 0000000000000400 x25: 000000000003a980
    x26: 00000000000008a4 x27: 000000000000be80
    x28: 0000000000000010 x29: 00000000ffe3d6a0
     
    Resetting CPU ...
     
    resetting ...
    Mailbox: Issuing mailbox cmd REBOOT_HPS

    After which it fails again.

    We're very interested in what you recommend, and how to fix this.

    • VWoll's avatar
      VWoll
      Icon for New Contributor rankNew Contributor

      By the way, we have observed this problem on 2 other production boards, but it is very inconsistent, and we do not have a way to reliably trigger this problem. Sometimes the issue lasts for a day or two, before clearing up on its own.

      This seems to be similar to the issue reported here: https://github.com/kraj/meta-altera/issues/164

      In that case, it looks like this may have to do with the SDM or CMF firmware. During compilation, using 19.4, we get the following warning:

      Warning (19729): Current CMF data structure hash (0xA2C420AC) is older version than latest CMF data structure but still allowable.

      This might be transition period. You should update your CMF to latest version with hash { 0x9603E739 } [Add operation to send JTAG ID to LSM]

      Does this have some bearing on the issue?