Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume back

From: Florian Fainelli
Date: Tue Apr 06 2021 - 14:47:19 EST




On 4/6/2021 11:43 AM, Heiner Kallweit wrote:
> On 06.04.2021 20:32, Florian Fainelli wrote:
>>
>>
>> On 4/6/2021 4:42 AM, Heiner Kallweit wrote:
>>>
>>> Waiting for ANEG_COMPLETE to be set wouldn't be a good option. Aneg may never
>>> complete for different reasons, e.g. no physical link. And even if we use a
>>> timeout this may add unwanted delays.
>>>
>>>> Do you have any other insights that can help me further locate the issue? Thanks.
>>>>
>>>
>>> I think current MAC/PHY PM handling isn't perfect. Often we have the following
>>> scenario:
>>>
>>> *suspend*
>>> 1. PHY is suspended (mdio_bus_phy_suspend)
>>> 2. MAC suspend callback (typically involving phy_stop())
>>>
>>> *resume*
>>> 1. MAC resume callback (typically involving phy_start())
>>> 2. PHY is resumed (mdio_bus_phy_resume), incl. calling phy_init_hw()
>>>
>>> Calling phy_init_hw() after phy_start() doesn't look right.
>>> It seems to work in most cases, but there's a certain risk
>>> that phy_init_hw() overwrites something, e.g. the advertised
>>> modes.
>>> I think we have two valid scenarios:
>>>
>>> 1. phylib PM callbacks are used, then the MAC driver shouldn't
>>> touch the PHY in its PM callbacks, especially not call
>>> phy_stop/phy_start.
>>>
>>> 2. MAC PM callbacks take care also of the PHY. Then I think we would
>>> need a flag at the phy_device telling it to make the PHY PM
>>> callbacks a no-op.
>>
>> Maybe part of the problem is that the FEC is calling phy_{stop,start} in
>> its suspend/resume callbacks instead of phy_{suspend,resume} which would
>> play nice and tell the MDIO bus PM callbacks that the PHY has already
>> been suspended.
>>
> This basically is what I just proposed to test.

What you suggested to be tested is to let the MDIO bus PM callbacks deal
with suspending the PHY, which is different from having the MAC do it
explicitly, both would be interesting to try out.

>
>> I am also suspicious about whether Wake-on-LAN actually works with the
>> FEC, you cannot wake from LAN if the PHY is stopped and powered down.
>>
> phy_stop() calls phy_suspend() which checks for WoL. Therefore this
> should not be a problem.

Indeed, and I had missed that phy_suspend() checks netdev->wol_enabled,
thanks for reminding me.
--
Florian