Re: [PATCH] net: phy: fix PHY possibly unwork after MDIO bus resume back

From: Florian Fainelli
Date: Mon Apr 05 2021 - 19:19:11 EST




On 4/5/2021 7:58 AM, Heiner Kallweit wrote:
> On 05.04.2021 15:53, Christian Melki wrote:
>> On 4/5/21 2:09 PM, Heiner Kallweit wrote:
>>> On 05.04.2021 10:43, Christian Melki wrote:
>>>> On 4/5/21 12:48 AM, Heiner Kallweit wrote:
>>>>> On 04.04.2021 16:09, Heiner Kallweit wrote:
>>>>>> On 04.04.2021 12:07, Joakim Zhang wrote:
>>>>>>> commit 4c0d2e96ba055 ("net: phy: consider that suspend2ram may cut
>>>>>>> off PHY power") invokes phy_init_hw() when MDIO bus resume, it will
>>>>>>> soft reset PHY if PHY driver implements soft_reset callback.
>>>>>>> commit 764d31cacfe4 ("net: phy: micrel: set soft_reset callback to
>>>>>>> genphy_soft_reset for KSZ8081") adds soft_reset for KSZ8081. After these
>>>>>>> two patches, I found i.MX6UL 14x14 EVK which connected to KSZ8081RNB doesn't
>>>>>>> work any more when system resume back, MAC driver is fec_main.c.
>>>>>>>
>>>>>>> It's obvious that initializing PHY hardware when MDIO bus resume back
>>>>>>> would introduce some regression when PHY implements soft_reset. When I
>>>>>>
>>>>>> Why is this obvious? Please elaborate on why a soft reset should break
>>>>>> something.
>>>>>>
>>>>>>> am debugging, I found PHY works fine if MAC doesn't support suspend/resume
>>>>>>> or phy_stop()/phy_start() doesn't been called during suspend/resume. This
>>>>>>> let me realize, PHY state machine phy_state_machine() could do something
>>>>>>> breaks the PHY.
>>>>>>>
>>>>>>> As we known, MAC resume first and then MDIO bus resume when system
>>>>>>> resume back from suspend. When MAC resume, usually it will invoke
>>>>>>> phy_start() where to change PHY state to PHY_UP, then trigger the stat> machine to run now. In phy_state_machine(), it will start/config
>>>>>>> auto-nego, then change PHY state to PHY_NOLINK, what to next is
>>>>>>> periodically check PHY link status. When MDIO bus resume, it will
>>>>>>> initialize PHY hardware, including soft_reset, what would soft_reset
>>>>>>> affect seems various from different PHYs. For KSZ8081RNB, when it in
>>>>>>> PHY_NOLINK state and then perform a soft reset, it will never complete
>>>>>>> auto-nego.
>>>>>>
>>>>>> Why? That would need to be checked in detail. Maybe chip errata
>>>>>> documentation provides a hint.
>>>>>>
>>>>>
>>>>> The KSZ8081 spec says the following about bit BMCR_PDOWN:
>>>>>
>>>>> If software reset (Register 0.15) is
>>>>> used to exit power-down mode
>>>>> (Register 0.11 = 1), two software
>>>>> reset writes (Register 0.15 = 1) are
>>>>> required. The first write clears
>>>>> power-down mode; the second
>>>>> write resets the chip and re-latches
>>>>> the pin strapping pin values.
>>>>>
>>>>> Maybe this causes the issue you see and genphy_soft_reset() isn't
>>>>> appropriate for this PHY. Please re-test with the KSZ8081 soft reset
>>>>> following the spec comment.
>>>>>
>>>>
>>>> Interesting. Never expected that behavior.
>>>> Thanks for catching it. Skimmed through the datasheets/erratas.
>>>> This is what I found (micrel.c):
>>>>
>>>> 10/100:
>>>> 8001 - Unaffected?
>>>> 8021/8031 - Double reset after PDOWN.
>>>> 8041 - Errata. PDOWN broken. Recommended do not use. Unclear if reset
>>>> solves the issue since errata says no error after reset but is also
>>>> claiming that only toggling PDOWN (may) or power will help.
>>>> 8051 - Double reset after PDOWN.
>>>> 8061 - Double reset after PDOWN.
>>>> 8081 - Double reset after PDOWN.
>>>> 8091 - Double reset after PDOWN.
>>>>
>>>> 10/100/1000:
>>>> Nothing in gigabit afaics.
>>>>
>>>> Switches:
>>>> 8862 - Not affected?
>>>> 8863 - Errata. PDOWN broken. Reset will not help. Workaround exists.
>>>> 8864 - Not affected?
>>>> 8873 - Errata. PDOWN broken. Reset will not help. Workaround exists.
>>>> 9477 - Errata. PDOWN broken. Will randomly cause link failure on
>>>> adjacent links. Do not use.
>>>>
>>>> This certainly explains a lot.
>>>>
>>>>>>>
>>>>>>> This patch changes PHY state to PHY_UP when MDIO bus resume back, it
>>>>>>> should be reasonable after PHY hardware re-initialized. Also give state
>>>>>>> machine a chance to start/config auto-nego again.
>>>>>>>
>>>>>>
>>>>>> If the MAC driver calls phy_stop() on suspend, then phydev->suspended
>>>>>> is true and mdio_bus_phy_may_suspend() returns false. As a consequence
>>>>>> phydev->suspended_by_mdio_bus is false and mdio_bus_phy_resume()
>>>>>> skips the PHY hw initialization.
>>>>>> Please also note that mdio_bus_phy_suspend() calls phy_stop_machine()
>>>>>> that sets the state to PHY_UP.
>>>>>>
>>>>>
>>>>> Forgot that MDIO bus suspend is done before MAC driver suspend.
>>>>> Therefore disregard this part for now.
>>>>>
>>>>>> Having said that the current argumentation isn't convincing. I'm not
>>>>>> aware of such issues on other systems, therefore it's likely that
>>>>>> something is system-dependent.
>>>>>>
>>>>>> Please check the exact call sequence on your system, maybe it
>>>>>> provides a hint.
>>>>>>
>>>>>>> Signed-off-by: Joakim Zhang <qiangqing.zhang@xxxxxxx>
>>>>>>> ---
>>>>>>> drivers/net/phy/phy_device.c | 7 +++++++
>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>>> index cc38e326405a..312a6f662481 100644
>>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>>> @@ -306,6 +306,13 @@ static __maybe_unused int mdio_bus_phy_resume(struct device *dev)
>>>>>>> ret = phy_resume(phydev);
>>>>>>> if (ret < 0)
>>>>>>> return ret;
>>>>>>> +
>>>>>>> + /* PHY state could be changed to PHY_NOLINK from MAC controller resume
>>>>>>> + * rounte with phy_start(), here change to PHY_UP after re-initializing
>>>>>>> + * PHY hardware, let PHY state machine to start/config auto-nego again.
>>>>>>> + */
>>>>>>> + phydev->state = PHY_UP;
>>>>>>> +
>>>>>>> no_resume:
>>>>>>> if (phydev->attached_dev && phydev->adjust_link)
>>>>>>> phy_start_machine(phydev);
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> This is a quick draft of the modified soft reset for KSZ8081.
>>> Some tests would be appreciated.
>>>
>>>
>>> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
>>> index a14a00328..4902235a8 100644
>>> --- a/drivers/net/phy/micrel.c
>>> +++ b/drivers/net/phy/micrel.c
>>> @@ -1091,6 +1091,42 @@ static void kszphy_get_stats(struct phy_device *phydev,
>>> data[i] = kszphy_get_stat(phydev, i);
>>> }
>>>
>>> +int ksz8081_soft_reset(struct phy_device *phydev)
>>> +{
>>> + int bmcr, ret, val;
>>> +
>>> + phy_lock_mdio_bus(phydev);
>>> +
>>> + bmcr = __phy_read(phydev, MII_BMCR);
>>> + if (bmcr < 0)
>>> + return bmcr;
>>> +
>>> + bmcr |= BMCR_RESET;
>>> +
>>> + if (bmcr & BMCR_PDOWN)
>>> + __phy_write(phydev, MII_BMCR, bmcr);
>>> +
>>> + if (phydev->autoneg == AUTONEG_ENABLE)
>>> + bmcr |= BMCR_ANRESTART;
>>> +
>>> + __phy_write(phydev, MII_BMCR, bmcr & ~BMCR_ISOLATE);
>>> +
>>
>> Wouldn't this re-set BMCR_PDOWN?
>
> No. A soft reset is supposed to reset all bits to their defaults.

FWIW, this behavior is not universally accepted by PHY vendors and some
take liberties in how BMCR_RESET is implemented.
--
Florian