Forum Discussion

BWU15's avatar
BWU15
Icon for New Contributor rankNew Contributor
6 years ago

we want to apply FA

We found 1pce Dagger failed at 2C test with failure code “GET_ROMMON” yesterday, and based on the failure symptom as failed log shows and debug preliminary analysis( eliminated process issue after visually inspected and 2D/5D X-ray test) , the failure was related with U1_F1(FPGA), after we did A-B-A swap test, it is true component issue, need to send for FA!

11 Replies

  • Hi Bruce, Thank you for contacting Intel Community. Please be noted that Intel FPGA requires details information for failure analysis request. Please help to answer questions below in order for us to better understand your position. 1. Please provide full device details. 1.1. Device name: 1.2. Full part number: 2. What is the failure rate? What is the failure rate vs. tested sample? Example: 2 out of 100 units. 3. What is the failure symptom? Please elaborate the failure symptom in detail. 4. When did the failure happen? How did you discover the failure? 5. How did you determine the failure? Please elaborate the procedures. 6. Does the failure unit ever working before failure? 7. Did they violate solder re-flow temperature profiles, moisture sensitivity? Please provide the re-flow temperature profiles. 8. Did you swap the failure device to a known good board? Is the failure following the device or board? 9. Is this a prototype build or volume/mass production? 10. Kindly provide quantitative investigation result that could proof the failure is Intel FPGA induced. Thank you Regards, Chia Ling
  • BWU15's avatar
    BWU15
    Icon for New Contributor rankNew Contributor

    1. Please provide full device details.

    Device name: IC,PLD-FPGA,EP4CGX75D-7,FBGA672, 1.2V, 1.0mm, PB-Free,C-TEMP (0 to 70'C)

    Full part number: CISH-16-4405-01/ EP4CGX75DF27C7N

    2. What is the failure rate? What is the failure rate vs. tested sample? Example: 2 out of 100 units.

    1/350=0.29%

    3. What is the failure symptom? Please elaborate the failure symptom in detail.

    Get ROMMON failed at 2C station

    4. When did the failure happen? How did you discover the failure?

    2019/02/14 ,The board failed at 2C station, failure code is Get ROMMON failure, the Post code LEDs

    (CR0/CR3/CR4/CR6_P80) keep solid light.

    5. How did you determine the failure? Please elaborate the procedures.

    This board failed to get Rommon at 1st time under 0’C, and retest the board failed at room temp 25’C. and we can duplicate it under room temperature.

    6. Does the failure unit ever working before failure?

    Yes, but rejected by your site for FA, Now cisco required us to send to you for FA.

    7. Did they violate solder re-flow temperature profiles, moisture sensitivity? Please provide the re-flow temperature profiles.

    Not violate

    8. Did you swap the failure device to a known good board? Is the failure following the device or board?

    We are doing swap for the device on a known good board.

    9. Is this a prototype build or volume/mass production?

    Mass production

    10. Kindly provide quantitative investigation result that could proof the failure is Intel FPGA induced.

    See in attachment for FA.

    • ChiaLing_T_Intel's avatar
      ChiaLing_T_Intel
      Icon for Contributor rankContributor

      Hi Bruce,

      Noted with all the information provided. The respective team will contact you directly for the next action.

      Thank you

      Regards,

      Chia Ling

  • BWU15's avatar
    BWU15
    Icon for New Contributor rankNew Contributor

    Email Details :

    a)Contact name and email address: Bruce.Wu/Bruce.wu2@flex.com

    b)Company details: Flex.

    c)Debugging steps that was done :

    1) Took the failed board (FDO230700G6) to do failure analysis, and review the failed log

    from 2C station as following:

    ????????

    Initializing Hardware ...

    ????????

    Initializing Hardware ...

    Checking for PCIe device presence...

    %ERROR% - Did not find CPLD. Read failure!

    %ERROR: Critical device not found on 00:01.00

    %WARNING: Resetting...

    2)Put the failure board to debug station at room temperature, then power on, found

    it is not stable, sometimes it can be boot up normally; sometimes it failed with get

    ROMMON symptom (it can be duplicated), the failure log and Post code failed

    status is as same as it happened at 2C station.

    3)Based on the fail log and debug experience before, it is related with FPGA(U1_F1),

    we checked the FPGA and the parts around with it, no process issue found, also

    measured the impedance and voltages related with FPGA and CPU and compared

    with a known pass board, all of them are normal

    4)Visual inspection (including 2D and 5D x-ray test) for the whole PCBA especially for

    FPGA (U1_F1) and CPU, no process issue found

    5)Captured the signals between FPGA and CPU, found CPU_LPC_AD<0/1/2/3> has any

    abnormality after compared with a known pass board, the details as below:

    6) Replaced FPGA with a new one, the failure symptom dis-appear

    d)Device Purchase Order: 85YY73943

    • ChiaLing_T_Intel's avatar
      ChiaLing_T_Intel
      Icon for Contributor rankContributor

      Hi Bruce,

      Noted with thanks. Please be noted that the respective team will contact you directly for the next action.

      Thank you

      Regards,

      Chia Ling

      • BWU15's avatar
        BWU15
        Icon for New Contributor rankNew Contributor
        Pls reply the emergency case in 24hours.Thanks! Best regards! Bruce Wu(SQE) Flex(Zhuhai) PCBA-B11/ Cisco Project Office: +86 0756 5183079 Mobile:134-2399-0614
  • Hi Bruce, I believe a respective team had contacted you for further step. Please let me know if you haven't receive the email. Thank you Regards, Chia Ling
  • BWU15's avatar
    BWU15
    Icon for New Contributor rankNew Contributor

    ​what is the FA result ?any updated ?thanks!

  • BWU15's avatar
    BWU15
    Icon for New Contributor rankNew Contributor

    ​any update results? it pending so many time ,and did not received any reply ,cisco is waiting for the result, pls help to update soon. thanks!

    • Zawani_M_Intel's avatar
      Zawani_M_Intel
      Icon for Frequent Contributor rankFrequent Contributor

      Hi Bruce,

      We will update you the findings through our private message since the FA details contains confidential info.

      I really appreciate your understanding.

      Thanks!

      Wani

  • BWU15's avatar
    BWU15
    Icon for New Contributor rankNew Contributor

    ​ Hi Intel team,

    As this is fabrication defect found in the samples ,so can you provide the CN for scarp the 3pcs in your side ? By the way ,can we return the defects to you side when the components has the” GET_ROMMON” failure in

    the further ?We believe it is helpful for you to investigate, thanks!

    Device #2 and #3:

    Device #2 and #3 failed functional tests at low temperatures. Additional characterization showed:

    • Device#2 failed transceiver output buffer test and transceiver ICDR (Interpolator Clock Data

    Recovery) speed test at 25°C and 0°C.

    • Device#3 failed transceiver output buffer test and transceiver ICDR (Interpolator Clock Data

    Recovery) speed test at 0°C only.

    ICC values of both ERMA devices are comparable to factory standard device, this rules out an electrical

    overstress damage (EOS) as cause of functional failures.

    The devices failure was believed to be caused by a random defect. Such a defect is introduced into the

    devices during the wafer fabrication process, and can cause a latent failure. This kind of fabrication defect

    is random in nature, and does not pose any concern for reliability of other devices.