Hello. Yesterday I figured out what the cold boot issue was and fixed it.
There is a bug in 19.1 where the tool allows you to do a pin swap on pcie_perst# without error message. So when the layout guy asked me to swap it, I did and the tool didn't complain so I thought it was okay.
20.1.1 caught the mistake at some point (not right away) and refused to build the FPGA until I put it back on PIN_W22.
I worked around the issue for now by putting a weak pullup on PIN_W22 and assigning the net that had been on it to a spare I/O.
I also managed to build a version of the FPGA with PCIe, 256K SRAM buffer, and MSGDMA installed, fitting and meeting timing. Although I'm having some hold time violations periodically. The part is 87% ALM utilized and 55% SRAM utilized.
The only remaining issue is the fact that the nvme driver fails to put the NVMe device in bus master mode. It times out waiting for the busmaster to acknowledge.
To get the full design to fit in the FPGA with my logic, I had to set the flag, "Single DW Completer." I think that should be fine because there is and only ever will be a single NVMe card on the PCIe slot.
I'm currently running a vanilla 5.7.10 kernel because I noticed there were changes to some of the altera drivers and I wanted to make sure there wasn't a change that would help my cause.
The PCIe driver reports the following:
[ 0.781196] altera-pcie c0000000.pcie: host bridge /soc/bridge@c0000000/pcie@000000000 ranges:
[ 0.781225] altera-pcie c0000000.pcie: Parsing ranges property...
[ 0.781262] altera-pcie c0000000.pcie: MEM 0x00c0000000..0x00dfffffff -> 0x0000000000
[ 0.781504] altera-pcie c0000000.pcie: PCI host bridge to bus 0000:00
[ 0.781528] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.781545] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] (bus address [0x00000000-0x1fffffff])
[ 0.781557] pci_bus 0000:00: scanning bus
[ 0.781855] pci 0000:00:00.0: [1172:e000] type 01 class 0x060400
[ 0.785218] pci_bus 0000:00: fixups for bus
[ 0.785307] PCI: bus0: Fast back to back transfers disabled
[ 0.785338] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[ 0.785351] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 0.785439] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[ 0.785928] pci_bus 0000:01: scanning bus
[ 0.786399] pci 0000:01:00.0: [8086:2522] type 00 class 0x010802
[ 0.786709] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xc0003fff 64bit]
[ 0.787092] pci 0000:01:00.0: reg 0x20: [mem 0xc0000000-0xc000ffff 64bit]
[ 0.787328] pci 0000:01:00.0: enabling Extended Tags
[ 0.789559] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x2 link at 0000:00:00.0 (capable of 15.752 Gb/s with 8.0 GT/s PC
Ie x2 link)
[ 0.791611] pci_bus 0000:01: fixups for bus
[ 0.791701] PCI: bus1: Fast back to back transfers disabled
[ 0.791714] pci_bus 0000:01: bus scan returning with max=01
[ 0.791729] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 0.791761] pci_bus 0000:00: bus scan returning with max=01
[ 0.791787] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff]
[ 0.791805] pci 0000:01:00.0: BAR 4: assigned [mem 0xc0000000-0xc000ffff 64bit]
[ 0.791950] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0010000-0xc0013fff 64bit]
[ 0.792093] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.792134] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff]
[ 0.792353] pcieport 0000:00:00.0: assign IRQ: got 56
[ 0.792415] pcieport 0000:00:00.0: enabling device (0140 -> 0142)
[ 0.792543] pcieport 0000:00:00.0: enabling bus mastering
[ 0.793014] pcieport 0000:00:00.0: PME: Signaling with IRQ 57
[ 0.793261] pcieport 0000:00:00.0: saving config space at offset 0x0 (reading 0xe0001172)
[ 0.793292] pcieport 0000:00:00.0: saving config space at offset 0x4 (reading 0x100546)
[ 0.793320] pcieport 0000:00:00.0: saving config space at offset 0x8 (reading 0x6040001)
[ 0.793348] pcieport 0000:00:00.0: saving config space at offset 0xc (reading 0x10010)
[ 0.793359] pcieport 0000:00:00.0: saving config space at offset 0x10 (reading 0x0)
[ 0.793386] pcieport 0000:00:00.0: saving config space at offset 0x14 (reading 0x0)
[ 0.793412] pcieport 0000:00:00.0: saving config space at offset 0x18 (reading 0x10100)
[ 0.793439] pcieport 0000:00:00.0: saving config space at offset 0x1c (reading 0x0)
[ 0.793465] pcieport 0000:00:00.0: saving config space at offset 0x20 (reading 0x0)
[ 0.793492] pcieport 0000:00:00.0: saving config space at offset 0x24 (reading 0x0)
[ 0.793518] pcieport 0000:00:00.0: saving config space at offset 0x28 (reading 0x0)
[ 0.793544] pcieport 0000:00:00.0: saving config space at offset 0x2c (reading 0x0)
[ 0.793571] pcieport 0000:00:00.0: saving config space at offset 0x30 (reading 0x0)
[ 0.793597] pcieport 0000:00:00.0: saving config space at offset 0x34 (reading 0x50)
[ 0.793623] pcieport 0000:00:00.0: saving config space at offset 0x38 (reading 0x0)
[ 0.793649] pcieport 0000:00:00.0: saving config space at offset 0x3c (reading 0x30138)
[ 0.796920] dma-pl330 ffe01000.pdma: Loaded driver for PL330 DMAC-341330
[ 0.796940] dma-pl330 ffe01000.pdma: DBUFF-512x8bytes Num_Chans-8 Num_Peri-32 Num_Events-8
The PCIe Driver reports the following:
[ 1.879231] nvme 0000:01:00.0: assign IRQ: got 56
[ 1.884200] nvme nvme0: pci function 0000:01:00.0
[ 1.889047] nvme 0000:01:00.0: enabling device (0140 -> 0142)
[ 1.894865] nvme 0000:01:00.0: enabling bus mastering
[ 1.904632] nvme 0000:01:00.0: saving config space at offset 0x0 (reading 0x25228086)
[ 1.915605] nvme 0000:01:00.0: saving config space at offset 0x4 (reading 0x100146)
[ 1.929796] nvme 0000:01:00.0: saving config space at offset 0x8 (reading 0x1080200)
[ 1.943672] nvme 0000:01:00.0: saving config space at offset 0xc (reading 0x10)
[ 1.956942] nvme 0000:01:00.0: saving config space at offset 0x10 (reading 0x10004)
[ 1.964611] nvme 0000:01:00.0: saving config space at offset 0x14 (reading 0x0)
[ 1.971942] nvme 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[ 1.985610] nvme 0000:01:00.0: saving config space at offset 0x1c (reading 0x0)
[ 1.998116] nvme 0000:01:00.0: saving config space at offset 0x20 (reading 0x4)
[ 2.012843] nvme 0000:01:00.0: saving config space at offset 0x24 (reading 0x0)
[ 2.027572] nvme 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[ 2.040072] nvme 0000:01:00.0: saving config space at offset 0x2c (reading 0x38108086)
[ 2.054023] nvme 0000:01:00.0: saving config space at offset 0x30 (reading 0x0)
[ 2.069193] nvme 0000:01:00.0: saving config space at offset 0x34 (reading 0x40)
[ 2.089687] nvme 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[ 2.096992] nvme 0000:01:00.0: saving config space at offset 0x3c (reading 0x138)
Then later, after a 62 second delay, the kernel reports:
[ 64.498070] nvme nvme0: I/O 20 QID 0 timeout, disable controller
[ 64.504259] nvme nvme0: Identify Controller failed (-4)
[ 64.509492] nvme nvme0: Removing after probe failure status: -5
I have tried four brands of NVMe card: intel, Samsung, Western Digital, and Greenliant. The Samsung drive is x4, the other are x2. All were properly identified.
Note: the output from linux-socfpga-5.4.74 is almost the same (just a little less verbose).
I'm hoping someone will have some idea where I need to look to figure out what is going wrong in the kernel.
Any help would be greatly appreciated.