Forum Discussion

PhaniNarasimham's avatar
PhaniNarasimham
Icon for New Contributor rankNew Contributor
2 years ago

PCIe bridge crash on Icelake-SP running pktgen on multiple ports

Sir,

We are running pktgen on multiple ports on ICELAKE.

*-pci:0
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:50:02.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pci pciexpress pm msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: iomemory:202f0-202ef irq:127 memory:202ffff20000-202ffff3ffff ioport:8000(size=4096) memory:d0b00000-d0efffff ioport:202ff4000000(size=173015040)
*-pci:1
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 4
bus info: pci@0000:50:04.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pci pciexpress pm msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: iomemory:202f0-202ef irq:128 memory:202ffff00000-202ffff1ffff ioport:9000(size=4096) memory:d0300000-d0afffff ioport:202fe0000000(size=307232768)

we are getting hardware crash.

Dec 16 14:02:34 HWHA2030006 kernel: BERT: Error records from previous boot:

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: event severity: fatal

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: Error 0, type: fatal

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: section_type: PCIe error

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: port_type: 4, root port

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: version: 3.0

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: command: 0x0540, status: 0x0010

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: device_id: 0000:50:02.0

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: slot: 2

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: secondary_bus: 0x00

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: vendor_id: 0x8086, device_id: 0x347a

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: class_code: 000406

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0000

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: aer_uncor_status: 0x00000000, aer_uncor_mask: 0x00100020

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: aer_uncor_severity: 0x00463010

Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: TLP Header: 0a000000 51030004 fd810000 00000000

Any idea how to debug and fix this issue sir

4 Replies

  • smt's avatar
    smt
    Icon for New Contributor rankNew Contributor

    Hi,


    Which IP you are using? can you share the .ip settings?

    Thanks.

    • PhaniNarasimham's avatar
      PhaniNarasimham
      Icon for New Contributor rankNew Contributor

      [root@HWHA3300011 log]# modinfo ice
      filename: /lib/modules/4.18.0-372.32.1.rt7.189.el8.x86_64/updates/drivers/net/ethernet/intel/ice/ice.ko
      firmware: intel/ice/ddp/ice.pkg
      version: 1.9.11
      license: GPL v2
      description: Intel(R) Ethernet Connection E800 Series Linux Driver
      author: Intel Corporation, <linux.nics@intel.com>
      rhelversion: 8.6
      srcversion: 06B8AD97B187AB8A177D9BB
      alias: pci:v00008086d00001888sv*sd*bc*sc*i*
      alias: pci:v00008086d0000579Fsv*sd*bc*sc*i*
      alias: pci:v00008086d0000579Esv*sd*bc*sc*i*
      alias: pci:v00008086d0000579Dsv*sd*bc*sc*i*
      alias: pci:v00008086d0000579Csv*sd*bc*sc*i*
      alias: pci:v00008086d0000151Dsv*sd*bc*sc*i*
      alias: pci:v00008086d0000124Fsv*sd*bc*sc*i*
      alias: pci:v00008086d0000124Esv*sd*bc*sc*i*
      alias: pci:v00008086d0000124Dsv*sd*bc*sc*i*
      alias: pci:v00008086d0000124Csv*sd*bc*sc*i*
      alias: pci:v00008086d0000189Asv*sd*bc*sc*i*
      alias: pci:v00008086d00001899sv*sd*bc*sc*i*
      alias: pci:v00008086d00001898sv*sd*bc*sc*i*
      alias: pci:v00008086d00001897sv*sd*bc*sc*i*
      alias: pci:v00008086d00001894sv*sd*bc*sc*i*
      alias: pci:v00008086d00001893sv*sd*bc*sc*i*
      alias: pci:v00008086d00001892sv*sd*bc*sc*i*
      alias: pci:v00008086d00001891sv*sd*bc*sc*i*
      alias: pci:v00008086d00001890sv*sd*bc*sc*i*
      alias: pci:v00008086d0000188Esv*sd*bc*sc*i*
      alias: pci:v00008086d0000188Dsv*sd*bc*sc*i*
      alias: pci:v00008086d0000188Csv*sd*bc*sc*i*
      alias: pci:v00008086d0000188Bsv*sd*bc*sc*i*
      alias: pci:v00008086d0000188Asv*sd*bc*sc*i*
      alias: pci:v00008086d0000159Bsv*sd*bc*sc*i*
      alias: pci:v00008086d0000159Asv*sd*bc*sc*i*
      alias: pci:v00008086d00001599sv*sd*bc*sc*i*
      alias: pci:v00008086d00001593sv*sd*bc*sc*i*
      alias: pci:v00008086d00001592sv*sd*bc*sc*i*
      alias: pci:v00008086d00001591sv*sd*bc*sc*i*
      depends:
      name: ice
      vermagic: 4.18.0-372.32.1.rt7.189.el8.x86_64 SMP preempt_rt mod_unload modversions
      parm: debug:netif level (0=none,...,16=all) (int)
      parm: fwlog_level:FW event level to log. All levels <= to the specified value are enabled. Values: 0=none, 1=error, 2=warning, 3=normal, 4=verbose. Invalid values: >=5
      (ushort)
      parm: fwlog_events:FW events to log (32-bit mask)

      • Zhaoxuan1's avatar
        Zhaoxuan1
        Icon for New Contributor rankNew Contributor

        Hi,

        Which IP did you choose and compile in Quartus? Could you share the IP parameters you set in platform designer? Which tile are you using? Could you share device messages before/after encounting errors with lspci -vvvs B:D.F command?

        Best Regards

        Zhao Xuan

  • KhaiChein_Y_Intel's avatar
    KhaiChein_Y_Intel
    Icon for Regular Contributor rankRegular Contributor

    Hi,

    We do not receive any response from you to the previous question. This thread will be transitioned to community support.

    If you have a new question, feel free to open a new thread to get the support from Intel experts.

    Otherwise, the community users will continue to help you on this thread.

    Thank you.


    Best regards,

    Khai