Another week, another bit of progress.
First the observations:
1. I proved that the MSGDMA and 256KB of SRAM are not used by removing them and getting the same performance. This saves a lot of space and performance is good enough with 140MB/s read.
2. The most important thing that I missed, which I forgot to mention last week, was the third and forth FPGA Bridges in the dts file. Here they are so others can add them: (note this has been added to the latest kernel, you just have to activate them. I found them in the latest kernel yesterday after spending weeks trying to figure out why DMA didn't work and adding them myself last week.)
fpga_bridge0: fpga_bridge@ff400000 {
compatible = "altr,socfpga-lwhps2fpga-bridge";
reg = <0xff400000 0x100000>;
resets = <&rst LWHPS2FPGA_RESET>;
clocks = <&l4_main_clk>;
};
fpga_bridge1: fpga_bridge@ff500000 {
compatible = "altr,socfpga-hps2fpga-bridge";
reg = <0xff500000 0x10000>;
resets = <&rst HPS2FPGA_RESET>;
clocks = <&l4_main_clk>;
};
fpga_bridge2: fpga-bridge@ff600000 {
compatible = "altr,socfpga-fpga2hps-bridge";
reg = <0xff600000 0x100000>;
resets = <&rst FPGA2HPS_RESET>;
clocks = <&l4_main_clk>;
status = "disabled";
};
fpga_bridge3: fpga-bridge@ffc25080 {
compatible = "altr,socfpga-fpga2sdram-bridge";
reg = <0xffc25080 0x4>;
status = "disabled";
};
fpgamgr0: fpgamgr@ff706000 {
compatible = "altr,socfpga-fpga-mgr";
reg = <0xff706000 0x1000
0xffb90000 0x4>;
interrupts = <0 175 4>;
};
3. The evil hack I came up with to fix the read errors is documented here. It seems to work, but as with all hacks, YMMV. (No I never understood how the DMA destination address is supposed to work, I'm not even positive I understand why this works).
Add this to your dts top: (sorry no formatting)
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
// 2 MiB reserved for PCI Express DMA
pcidma1@0 {
reg = <0x00000000 0x00200000>;
// Note: this is the maximum you can reserve with kernel defaults and relocating the kernel doesn't seem to work with ARM ARCH.
no-map;
};
};
This reserves the first 2MB of DRAM. The kernel complains:
ERROR: reserving fdt memory region failed (addr=0 size=200000)
however, it still works, as shown in 'cat /proc/iomem':
root@cyclone5:/mnt/test# cat /proc/iomem
00200000-3fffffff : System RAM <- Notice first 2MB is not used...
00c00000-00cabe7f : Kernel data
c0000000-c01fffff : pcie@000000000
c0000000-c00fffff : PCI Bus 0000:01
c0000000-c0003fff : 0000:01:00.0
c0000000-c0003fff : nvme
ff200000-ff20007f : ff200080.msi vector_slave
ff200080-ff20008f : ff200080.msi csr
ff220000-ff223fff : c0000000.pcie Cra
ff700000-ff701fff : ff700000.ethernet ethernet@ff700000
ff702000-ff703fff : ff702000.ethernet ethernet@ff702000
ff704000-ff704fff : ff704000.dwmmc0 dwmmc0@ff704000
ff705000-ff705fff : ff705000.spi spi@ff705000
ff706000-ff706fff : ff706000.fpgamgr fpgamgr@ff706000
ff709000-ff709fff : ff709000.gpio gpio@ff709000
ffa00000-ffa00fff : ff705000.spi spi@ff705000
ffb90000-ffb90003 : ff706000.fpgamgr fpgamgr@ff706000
ffc02000-ffc0201f : serial
ffc03000-ffc0301f : serial
ffc04000-ffc04fff : ffc04000.i2c i2c@ffc04000
ffd02000-ffd02fff : ffd02000.watchdog watchdog@ffd02000
ffd05000-ffd05fff : rstmgr
ffe01000-ffe01fff : pdma@ffe01000
ffff0000-ffffffff : ffff0000.sram
Next, make the Txs bus on the Avalon-MM PCIe hard macro, 2MB big by setting it to 32 bits wide with a 1MB address width and two address pages. (This might work with 1MB in 64 bit mode, I will verify that later to shrink the reserved memory area once I figure out my latest issue)
On to the latest issue.
Interrupts are being dropped by hardware or software:
In this example I copy a 463MB file from the SDCARD to the NVMe drive: (It seems to happen more on writes than reads).
root@cyclone5:/mnt/test# cp /home/root/463MB.bin 3MB_Copy.bin
[ 226.418525] nvme nvme0: I/O 128 QID 1 timeout, completion polled
[ 257.138480] nvme nvme0: I/O 160 QID 1 timeout, completion polled
[ 290.408531] nvme nvme0: I/O 192 QID 1 timeout, completion polled
[ 320.498581] nvme nvme0: I/O 260 QID 1 timeout, completion polled
[ 350.569413] nvme nvme0: I/O 288 QID 1 timeout, completion polled
[ 380.648522] nvme nvme0: I/O 288 QID 1 timeout, completion polled
[ 410.728549] nvme nvme0: I/O 288 QID 1 timeout, completion polled
[ 440.808519] nvme nvme0: I/O 288 QID 1 timeout, completion polled
[ 470.888467] nvme nvme0: I/O 320 QID 1 timeout, completion polled
[ 500.968466] nvme nvme0: I/O 329 QID 1 timeout, completion polled
[ 534.898518] nvme nvme0: I/O 323 QID 1 timeout, completion polled
root@cyclone5:/mnt/test#
root@cyclone5:/mnt/test# [ 544.193059] systemd-journald[64]: Sent WATCHDOG=1 notification.
[ 564.968528] nvme nvme0: I/O 352 QID 1 timeout, completion polled
[ 595.048526] nvme nvme0: I/O 384 QID 1 timeout, completion polled
[ 625.128505] nvme nvme0: I/O 418 QID 1 timeout, completion polled
[ 655.848455] nvme nvme0: I/O 469 QID 1 timeout, completion polled
You can see that many many interrupts are being dropped. I'm expecting to have to outfit the altera-msi.c code with printk's to try to figure this out. So far, all attempts to improve it by adjusting hardware have not made any difference. I'm tempted to try an old kernel, but that might open a bigger can of worms.
Again, if anyone knows anything about this issue, I'm all ears.