PCIe bridge crash on Icelake-SP running pktgen on multiple ports
Sir,
We are running pktgen on multiple ports on ICELAKE.
*-pci:0
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:50:02.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pci pciexpress pm msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: iomemory:202f0-202ef irq:127 memory:202ffff20000-202ffff3ffff ioport:8000(size=4096) memory:d0b00000-d0efffff ioport:202ff4000000(size=173015040)
*-pci:1
description: PCI bridge
product: Intel Corporation
vendor: Intel Corporation
physical id: 4
bus info: pci@0000:50:04.0
version: 04
width: 64 bits
clock: 33MHz
capabilities: pci pciexpress pm msi normal_decode bus_master cap_list
configuration: driver=pcieport
resources: iomemory:202f0-202ef irq:128 memory:202ffff00000-202ffff1ffff ioport:9000(size=4096) memory:d0300000-d0afffff ioport:202fe0000000(size=307232768)
we are getting hardware crash.
Dec 16 14:02:34 HWHA2030006 kernel: BERT: Error records from previous boot:
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: event severity: fatal
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: Error 0, type: fatal
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: section_type: PCIe error
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: port_type: 4, root port
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: version: 3.0
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: command: 0x0540, status: 0x0010
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: device_id: 0000:50:02.0
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: slot: 2
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: secondary_bus: 0x00
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: class_code: 000406
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0000
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: aer_uncor_status: 0x00000000, aer_uncor_mask: 0x00100020
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: aer_uncor_severity: 0x00463010
Dec 16 14:02:34 HWHA2030006 kernel: [Hardware Error]: TLP Header: 0a000000 51030004 fd810000 00000000
Any idea how to debug and fix this issue sir