Re: R8169: Network lockups in 4.18.{8,9,10} (and 4.19 dev)

From: Chris Clayton
Date: Sun Oct 07 2018 - 15:36:31 EST


Hi again,

I didn't think there was anything in 4.19-rc7 to fix this regression, but tried it anyway. I can confirm that the
regression is still present and my network still fails when, after a resume from suspend (to ram or disk), I open my
browser or my mail client. In both those cases the failure is almost immediate - e.g. my home page doesn't get displayed
in the browser. Pinging one of my ISPs name servers doesn't fail quite so quickly but the reported time increases from
14-15ms to more than 1000ms.

Chris

On 04/10/2018 09:41, Chris Clayton wrote:
> Hi Heiner,
>
> Here's the reply to your questions. Sorry for the delay.
>
> On 28/09/2018 23:13, Heiner Kallweit wrote:
>> On 29.09.2018 00:00, Chris Clayton wrote:
>>> Thanks Maciej.
>>>
>>> On 28/09/2018 16:54, Maciej S. Szmigiero wrote:
>>>> Hi,
>>>>
>>>>> Hi,
>>>>>
>>>>> I upgraded my kernel to 4.18.10 recently and have since been experiencing network problems after resuming from a
>>>>> suspend to RAM or disk. I previously had 4.18.6 and that was OK.
>>>>>
>>>>> The pattern of the problem is that when I first boot, the network is fine. But, after resume from suspend I find that
>>>>> the time taken for a ping of one of my ISP's nameservers increases from 14-15ms to more than 1000ms. Moreover, when I
>>>>> open a browser (chromium or firefox), it fails to retrieve my home page (https://www.google.co.uk) and pings of the
>>>>> nameserver fail with the message "Destination Host Unreachable". Often, I can revive the network by stopping it with
>>>>> /sbin/if(down,up} but sometimes it is necessary to also remove the r8169 module and load it again.
>>>>
>>>> Please have a look at the following thread:
>>>> https://lkml.org/lkml/2018/9/25/1118
>>>>
>>>
>>> I applied your patch for the 4.18 stable kernels to 4.18.10, but the problem is not solved by it. Similarly, I applied
>>> Heiner's patch to the 4.19, but again the problem is not solved.
>>>
>> I think we talk about two different issues here. The one the fix is for has no link to suspend/resume.
>>
>> Chris, the lspci output doesn't provide enough detail to determine the exact chip version.
>> Can you provide the dmesg part with the XID?
>
> $ dmesg | grep r8169
> [ 5.274938] libphy: r8169: probed
> [ 5.276563] r8169 0000:05:00.2 eth0: RTL8411, 80:fa:5b:08:d0:3d, XID 48800800, IRQ 29
> [ 5.278158] r8169 0000:05:00.2 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> [ 9.275275] RTL8211E Gigabit Ethernet r8169-502:00: attached PHY driver [RTL8211E Gigabit Ethernet]
> (mii_bus:phy_addr=r8169-502:00, irq=IGNORE)
> [ 9.460876] r8169 0000:05:00.2 eth0: No native access to PCI extended config space, falling back to CSI
> [ 11.005336] r8169 0000:05:00.2 eth0: Link is Up - 100Mbps/Full - flow control rx/tx
>
>> According to your lspci output neither MSI nor MSI-X is active.
>> Do you have to use nomsi for whatever reason?
>>
>
> No, I do not use nomsi, but MSI wasn't enabled in my kernel config. I'm 99% sure that it used to be - I've no idea how
> it got dropped. If I'm not sure about an option, I start by taking the recommendation in the kconfig help. Help on MSI
> has a very clear "say Y". I've re-enabled it now.
>
> Chris
>
>> Heiner
>>
>>>> Maciej
>>>>
>>> Chris
>>>
>>
>>