Forum Discussion
Hi Silvan,
When you get the devkit "stuck" waiting for a receive, try issuing the command, ethtool -d <eth_interface>, to get a dump of all the registers. This might provide deeper insight into the problem.
An experiment worth trying is to change the size of the ICMP packets sent by the ping command with the -s option. I suggest trying values of 128, 129, 130, and 131 to explore alignment as being part of the problem.
Another experiment to try is to enable "busy polling" every millisecond with the following command:
# echo 1000 > /proc/sys/net/core/busy_poll
The value of the parameter is microseconds; so, I would expect the RX path would get unstuck after 1 millisecond.
Regards,
Matthew
Hi Matthew,
Here, I can provide the additional information: First the Register values which I got with ethtool -d eth0 in the error state:
root@arria10:~# ethtool -d eth0 ST GMAC Registers GMAC Registers Reg0 0x00610C0C Reg1 0x00000404 Reg2 0x00000000 Reg3 0x00000000 Reg4 0x00003A90 Reg5 0x00003C00 Reg6 0xFFFF000E Reg7 0x00000000 Reg8 0x00001037 Reg9 0x00000120 Reg10 0x00000000 Reg11 0x00000000 Reg12 0x00020000 Reg13 0x03E80000 Reg14 0x00000001 Reg15 0x00000201 Reg16 0x80008EAC Reg17 0x9242062E Reg18 0x00000000 Reg19 0x00000000 Reg20 0x00000000 Reg21 0x00000000 Reg22 0x00000000 Reg23 0x00000000 Reg24 0x00000000 Reg25 0x00000000 Reg26 0x00000000 Reg27 0x00000000 Reg28 0x00000000 Reg29 0x00000000 Reg30 0x00000000 Reg31 0x00000000 Reg32 0x00000000 Reg33 0x00000000 Reg34 0x00000000 Reg35 0x00000000 Reg36 0x00000000 Reg37 0x00000000 Reg38 0x00000000 Reg39 0x00000000 Reg40 0x00000000 Reg41 0x00000000 Reg42 0x00000000 Reg43 0x00000000 Reg44 0x00000000 Reg45 0x00000000 Reg46 0x00000000 Reg47 0x00000000 Reg48 0x00000000 Reg49 0x00000000 Reg50 0x00000000 Reg51 0x00000000 Reg52 0x00000000 Reg53 0x00000000 Reg54 0x0000000D DMA Registers Reg0 0x01900880 Reg1 0x00000000 Reg2 0x00000000 Reg3 0x027B0000 Reg4 0x027B8000 Reg5 0x00660404 Reg6 0x02202906 Reg7 0x0001A061 Reg8 0x000000B7 Reg9 0x00000000 Reg10 0x00FF0009 Reg11 0x00000000 Reg12 0x00000000 Reg13 0x00000000 Reg14 0x00000000 Reg15 0x00000000 Reg16 0x00000000 Reg17 0x00000000 Reg18 0x027B9FE0 Reg19 0x027B3DC0 Reg20 0x030047CA Reg21 0x02E0F000 Reg22 0x170D69BF root@arria10:~#
I did the tests with the ping message size (-s option). It has no impact or changes in the behavior. On my host PC I started a tcpdump which results in the following output:
11:59:55.158944 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21008, seq 1, length 136 11:59:55.159162 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 20934, seq 1, length 64 12:00:02.411714 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21139, seq 1, length 137 12:00:02.412154 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21008, seq 1, length 136 12:00:09.874869 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21356, seq 1, length 138 12:00:09.875066 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21139, seq 1, length 137 12:00:27.903913 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21687, seq 1, length 139 12:00:27.904126 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21356, seq 1, length 138 12:00:32.869880 IP heldsksm1 > 192.168.2.72: ICMP echo request, id 21766, seq 1, length 64 12:00:32.870074 IP 192.168.2.72 > heldsksm1: ICMP echo reply, id 21687, seq 1, length 139
The reply is always that one from the previous request, not from the current one. It is independent of the time, when the new request was sent.
In parallel I executed tcpdump also on the Arria 10 device:
08:03:56.351322 IP 192.168.2.201 > arria10: ICMP echo request, id 20934, seq 1, length 64 08:03:56.351389 IP arria10 > 192.168.2.201: ICMP echo reply, id 20934, seq 1, length 64 08:04:03.604107 IP 192.168.2.201 > arria10: ICMP echo request, id 21008, seq 1, length 136 08:04:03.604171 IP arria10 > 192.168.2.201: ICMP echo reply, id 21008, seq 1, length 136 08:04:11.067233 IP 192.168.2.201 > arria10: ICMP echo request, id 21139, seq 1, length 137 08:04:11.067297 IP arria10 > 192.168.2.201: ICMP echo reply, id 21139, seq 1, length 137 08:04:29.096255 IP 192.168.2.201 > arria10: ICMP echo request, id 21356, seq 1, length 138 08:04:29.096331 IP arria10 > 192.168.2.201: ICMP echo reply, id 21356, seq 1, length 138 08:04:34.062204 IP 192.168.2.201 > arria10: ICMP echo request, id 21687, seq 1, length 139 08:04:34.062267 IP arria10 > 192.168.2.201: ICMP echo reply, id 21687, seq 1, length 139
It seams, that the Arria10 device get the ping messages and immediately reply to them. But we can see the package offset. The Arria 10 device gets the "old" 64bytes long messages and reply to them where the host was sending the 136bytes message.
Based on that observation and in combination with the information of the Register 9 (gmacgrp_debug), it seams, that the offset source is in the Rx path. It seams, that the Rx FIFO readout controller is responsible for this offset. Do you know any Registers to get more information about the Readout controller of the MAC?
I think, in the current case, there is no option on OS level to influence this directly, because the offset is down in the MAC hardware.
I tried also the "polling" configuration which you suggested and additional I set them also for /proc/sys/net/core/busy_read. Both of them doesn't changed anything.
When we think, that it is probably an issue in the FIFO readout controller, I would also expect no behavior change based on the polling setting. This because, the polling setting works on the "Descriptors" in the kernel memory and not the FIFO state.
Any idea how we can solve the issue? Or how we can get more information about the root cause? Maybe it is possible to trigger the FIFO read controller manually? To get them back in sync?
Thanks you for your support and best regards,
Silvan