Re: network problems with r8169

From: Heiner Kallweit
Date: Sat Jul 20 2019 - 05:11:36 EST


On 19.07.2019 23:12, Thomas Voegtle wrote:
> On Fri, 19 Jul 2019, Heiner Kallweit wrote:
>
>> On 18.07.2019 20:50, Thomas Voegtle wrote:
>>>
>>> Hello,
>>>
>>> I'm having network problems with the commits on r8169 since v5.2. There are ping packet loss, sometimes 100%, sometimes 50%. In the end network is unusable.
>>>
>>> v5.2 is fine, I bisected it down to:
>>>
>>> a2928d28643e3c064ff41397281d20c445525032 is the first bad commit
>>> commit a2928d28643e3c064ff41397281d20c445525032
>>> Author: Heiner Kallweit <hkallweit1@xxxxxxxxx>
>>> Date:ÂÂ Sun Jun 2 10:53:49 2019 +0200
>>>
>>> ÂÂÂ r8169: use paged versions of phylib MDIO access functions
>>>
>>> ÂÂÂ Use paged versions of phylib MDIO access functions to simplify
>>> ÂÂÂ the code.
>>>
>>> ÂÂÂ Signed-off-by: Heiner Kallweit <hkallweit1@xxxxxxxxx>
>>> ÂÂÂ Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>>>
>>>
>>> Reverting that commit on top of v5.2-11564-g22051d9c4a57 fixes the problem
>>> for me (had to adjust the renaming to r8169_main.c).
>>>
>>> I have a:
>>> 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev
>>> 0c)
>>> ÂÂÂÂÂÂÂ Subsystem: Biostar Microtech Int'l Corp Device [1565:2400]
>>> ÂÂÂÂÂÂÂ Kernel driver in use: r8169
>>>
>>> on a BIOSTAR H81MG motherboard.
>>>
>> Interesting. I have the same chip version (RTL8168g) and can't reproduce
>> the issue. Can you provide a full dmesg output and test the patch below
>> on top of linux-next? I'd be interested in the WARN_ON stack traces
>> (if any) and would like to know whether the experimental change to
>> __phy_modify_changed helps.
>>
>>>
>>> greetings,
>>>
>>> Â Thomas
>>>
>>>
>> Heiner
>>
>>
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 8d7dd4c5f..26be73000 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -1934,6 +1934,8 @@ static int rtl_get_eee_supp(struct rtl8169_private *tp)
>> ÂÂÂÂstruct phy_device *phydev = tp->phydev;
>> ÂÂÂÂint ret;
>>
>> +ÂÂÂ WARN_ON(phy_read(phydev, 0x1f));
>> +
>> ÂÂÂÂswitch (tp->mac_version) {
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_34:
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_35:
>> @@ -1957,6 +1959,8 @@ static int rtl_get_eee_lpadv(struct rtl8169_private *tp)
>> ÂÂÂÂstruct phy_device *phydev = tp->phydev;
>> ÂÂÂÂint ret;
>>
>> +ÂÂÂ WARN_ON(phy_read(phydev, 0x1f));
>> +
>> ÂÂÂÂswitch (tp->mac_version) {
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_34:
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_35:
>> @@ -1980,6 +1984,8 @@ static int rtl_get_eee_adv(struct rtl8169_private *tp)
>> ÂÂÂÂstruct phy_device *phydev = tp->phydev;
>> ÂÂÂÂint ret;
>>
>> +ÂÂÂ WARN_ON(phy_read(phydev, 0x1f));
>> +
>> ÂÂÂÂswitch (tp->mac_version) {
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_34:
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_35:
>> @@ -2003,6 +2009,8 @@ static int rtl_set_eee_adv(struct rtl8169_private *tp, int val)
>> ÂÂÂÂstruct phy_device *phydev = tp->phydev;
>> ÂÂÂÂint ret = 0;
>>
>> +ÂÂÂ WARN_ON(phy_read(phydev, 0x1f));
>> +
>> ÂÂÂÂswitch (tp->mac_version) {
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_34:
>> ÂÂÂÂcase RTL_GIGA_MAC_VER_35:
>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>> index 16667fbac..1aa1142b8 100644
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -463,12 +463,10 @@ int __phy_modify_changed(struct phy_device *phydev, u32 regnum, u16 mask,
>> ÂÂÂÂÂÂÂ return ret;
>>
>> ÂÂÂÂnew = (ret & ~mask) | set;
>> -ÂÂÂ if (new == ret)
>> -ÂÂÂÂÂÂÂ return 0;
>>
>> -ÂÂÂ ret = __phy_write(phydev, regnum, new);
>> +ÂÂÂ __phy_write(phydev, regnum, new);
>>
>> -ÂÂÂ return ret < 0 ? ret : 1;
>> +ÂÂÂ return new != ret;
>> }
>> EXPORT_SYMBOL_GPL(__phy_modify_changed);
>>
>>
>
> Took your patch on top of next-20190719.
> See attached dmesg.
> It didn't work. Same thing, lots of ping drops, no usable network.
>
> like that:
> 44 packets transmitted, 2 received, 95% packet loss, time 44005ms
>
>
> Maybe important:
> I build a kernel with no modules.
>
> I have to power off when I booted a kernel which doesn't work, a (soft) reboot into a older kernel (e.g. 4.9.y)Â doesn't
> fix the problem. Powering off and on does.
>

Then, what you could do is reversing the hunks of the patch step by step.
Or make them separate patches and bisect.
Relevant are the hunks from point 1 and 2.

1. first 5 hunks (I don't think you have to reverse them individually)
EEE-related

2. rtl8168g_disable_aldps, rtl8168g_phy_adjust_10m_aldps, rtl8168g_1_hw_phy_config
all of these hunks are in the path for RTL8168g

3. rtl8168h_1_hw_phy_config, rtl8168h_2_hw_phy_config, rtl8168ep_1_hw_phy_config,
rtl8168ep_2_hw_phy_config
not in the path for RTL8168g

>
> greetings,
>
> ÂÂÂÂÂ Thomas
Heiner