Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load

From: David Arendt
Date: Sun Sep 16 2018 - 08:38:32 EST


Hi,

I have applied the patch one hour ago. So far there are no problems but
because sometimes the problems only appeared after a few hours, I will
only definitively know tomorrow if the patch helped or not.

If not, I will try bisecting the problem.

For information here the differences from ethtool between the working
driver from 4.17.14 and the patched one fom 4.18.8:

--- working.txt 2018-09-16 14:14:00.544376935 +0200
+++ patched.txt 2018-09-16 14:20:09.445660915 +0200
@@ -5,2 +5,2 @@
-0x10: Dump Tally Counter CommandÂÂ 0xf900c000 0x00000007
-0x20: Tx Normal Priority Ring Addr 0xf3aa7000 0x00000007
+0x10: Dump Tally Counter CommandÂÂ 0xf9260000 0x00000007
+0x20: Tx Normal Priority Ring Addr 0xebb73000 0x00000007
@@ -17 +17 @@
-0x40: Tx ConfigurationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x4f000f80
+0x40: Tx ConfigurationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x4f000f00
@@ -31,2 +31,2 @@
-0x64: TBI control and statusÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x17ffff01
-0x68: TBI Autonegotiation advertisement (ANAR)ÂÂÂ 0xf70c
+0x64: TBI control and statusÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x00000000
+0x68: TBI Autonegotiation advertisement (ANAR)ÂÂÂ 0x0000
@@ -35 +35 @@
-0x84: PM wakeup frame 0ÂÂÂÂÂÂÂÂÂÂÂ 0x04000000 0x7c5b5c95
+0x84: PM wakeup frame 0ÂÂÂÂÂÂÂÂÂÂÂ 0x04000000 0x710b8deb
@@ -57 +57 @@
-0xE4: Rx Ring AddrÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0xf3b64000 0x00000007
+0xE4: Rx Ring AddrÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0xef9f0000 0x00000007

Thanks in advance,
David Arendt

On 9/16/18 1:54 AM, Maciej S. Szmigiero wrote:
> [ I've added Realtek Linux NIC and netdev mailing lists to CC ]
>
> Hi David,
>
> On 15.09.2018 23:23, David Arendt wrote:
>> Hi,
>>
>> just a follow up:
>>
>> In kernel 4.18.8 the behaviour is different.
>>
>> The network is not reachable a number of times, but restarting to be
>> reachable by itself before it finally is no longer reachable at all.
>>
>> Here the logging output:
>>
>> Sep 15 17:44:43 server kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit
>> queue 0 timed out
>> Sep 15 17:44:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:10:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:12:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:13:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:14:48 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:20:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:34:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:43:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:46:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 19:00:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>>
>> From 17:44 ro 18:46 the network is recovering automatically. After the
>> up from 19:00, the network is no longer reachable without any additional
>> message.
>>
>> If looking at ifconfig, the counter for TX packets is incrementing, the
>> counter for RX packets not.
>>
>> Here again the driver from 4.17.14 is working flawlessly.
> Could you please try this patch on top of 4.18.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f74dd480cf4e31e12971c58a1d832044db945670
>
> In my case the problem fixed by the above commit was only limited to
> bad TX performance but my r8169 NIC models were different from what
> you have.
>
> If this does not help then try bisecting the issue
> (maybe limited to drivers/net/ethernet/realtek/r8169.c to save time).
> If the NIC dies after a heavy load it might be possible to generate
> such load quickly by in-kernel pktgen.
>
> If that's not possible then at please least compare NIC register
> values displayed by "ethtool -d enp3s0" between working and
> non-working kernels.
>
>> Thanks in advance,
>> David Arendt
> Maciej
>
>>
>> On 9/4/18 8:19 AM, David Arendt wrote:
>>> Hi,
>>>
>>> When using kernel 4.18.5 the Realtek 8111G network adapter stops
>>> responding under high system load.
>>>
>>> Dmesg is showing no errors.
>>>
>>> Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
>>> enough for the network adapter to restart responding. Sometimes a reboot
>>> is necessary.
>>>
>>> When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
>>> perfectly stable on 4.18.5 so the problem seems r8169.c related.
>>>
>>> Here the output from lshw:
>>>
>>> ÂÂÂÂÂÂÂ *-pci:2
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ description: PCI bridge
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ product: 8 Series/C220 Series Chipset Family PCI Express
>>> Root Port #3
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ vendor: Intel Corporation
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ physical id: 1c.2
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ bus info: pci@0000:00:1c.2
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ version: d5
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ width: 32 bits
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ clock: 33MHz
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ capabilities: pci pciexpress msi pm normal_decode
>>> bus_master cap_list
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ configuration: driver=pcieport
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ resources: irq:18 ioport:d000(size=4096)
>>> memory:f7300000-f73fffff ioport:f2100000(size=1048576)
>>> ÂÂÂÂÂÂÂÂÂÂ *-network
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ description: Ethernet interface
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
>>> Controller
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vendor: Realtek Semiconductor Co., Ltd.
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ physical id: 0
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ bus info: pci@0000:03:00.0
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ logical name: enp3s0
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ version: 0c
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ serial: <hidden>
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ size: 1Gbit/s
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ capacity: 1Gbit/s
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ width: 64 bits
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ clock: 33MHz
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ capabilities: pm msi pciexpress msix vpd bus_master
>>> cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
>>> 1000bt-fd autonegotiation
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ configuration: autonegotiation=on broadcast=yes
>>> driver=r8169 driverversion=2.3LK-NAPI duplex=full
>>> firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
>>> port=MII speed=1Gbit/s
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ resources: irq:18 ioport:d000(size=256)
>>> memory:f7300000-f7300fff memory:f2100000-f2103fff
>>>
>>> Thanks in advance for looking into this,
>>>
>>> David Arendt
>>>
>>>