Re: [PATCH net] net: phy: Warn about incorrect mdio_bus_phy_resume() state
From: Marek Szyprowski
Date: Tue Aug 16 2022 - 07:46:15 EST
On 12.08.2022 18:32, Florian Fainelli wrote:
> On 8/12/22 04:19, Marek Szyprowski wrote:
>> Hi All,
>>
>> On 02.08.2022 01:34, Florian Fainelli wrote:
>>> Calling mdio_bus_phy_resume() with neither the PHY state machine set to
>>> PHY_HALTED nor phydev->mac_managed_pm set to true is a good indication
>>> that we can produce a race condition looking like this:
>>>
>>> CPU0 CPU1
>>> bcmgenet_resume
>>> -> phy_resume
>>> -> phy_init_hw
>>> -> phy_start
>>> -> phy_resume
>>> phy_start_aneg()
>>> mdio_bus_phy_resume
>>> -> phy_resume
>>> -> phy_write(..., BMCR_RESET)
>>> -> usleep() -> phy_read()
>>>
>>> with the phy_resume() function triggering a PHY behavior that might
>>> have
>>> to be worked around with (see bf8bfc4336f7 ("net: phy: broadcom: Fix
>>> brcm_fet_config_init()") for instance) that ultimately leads to an
>>> error
>>> reading from the PHY.
>>>
>>> Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC
>>> driver manages PHY PM")
>>> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
>>
>> This patch, as probably intended, triggers a warning during system
>> suspend/resume cycle in the SMSC911x driver. I've observed it on ARM
>> Juno R1 board on the kernel compiled from next-202208010:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 398 at drivers/net/phy/phy_device.c:323
>> mdio_bus_phy_resume+0x34/0xc8
>> Modules linked in: smsc911x cpufreq_powersave cpufreq_conservative
>> crct10dif_ce ip_tables x_tables ipv6 [last unloaded: smsc911x]
>> CPU: 1 PID: 398 Comm: rtcwake Not tainted 5.19.0+ #940
>> Hardware name: ARM Juno development board (r1) (DT)
>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : mdio_bus_phy_resume+0x34/0xc8
>> lr : dpm_run_callback+0x74/0x350
>> ...
>> Call trace:
>> mdio_bus_phy_resume+0x34/0xc8
>> dpm_run_callback+0x74/0x350
>> device_resume+0xb8/0x258
>> dpm_resume+0x120/0x4a8
>> dpm_resume_end+0x14/0x28
>> suspend_devices_and_enter+0x164/0xa60
>> pm_suspend+0x25c/0x3a8
>> state_store+0x84/0x108
>> kobj_attr_store+0x14/0x28
>> sysfs_kf_write+0x60/0x70
>> kernfs_fop_write_iter+0x124/0x1a8
>> new_sync_write+0xd0/0x190
>> vfs_write+0x208/0x478
>> ksys_write+0x64/0xf0
>> __arm64_sys_write+0x14/0x20
>> invoke_syscall+0x40/0xf8
>> el0_svc_common.constprop.3+0x8c/0x120
>> do_el0_svc+0x28/0xc8
>> el0_svc+0x48/0xd0
>> el0t_64_sync_handler+0x94/0xb8
>> el0t_64_sync+0x15c/0x160
>> irq event stamp: 24406
>> hardirqs last enabled at (24405): [<ffff8000090c4734>]
>> _raw_spin_unlock_irqrestore+0x8c/0x90
>> hardirqs last disabled at (24406): [<ffff8000090b3164>]
>> el1_dbg+0x24/0x88
>> softirqs last enabled at (24144): [<ffff800008010488>]
>> _stext+0x488/0x5cc
>> softirqs last disabled at (24139): [<ffff80000809bf98>]
>> irq_exit_rcu+0x168/0x1a8
>> ---[ end trace 0000000000000000 ]---
>>
>> I hope the above information will help fixing the driver.
>
> Yes this is catching an actual issue in the driver in that the PHY
> state machine is still running while the system is trying to suspend.
> We could go about fixing it in a different number of ways, though I
> believe this one is probably correct enough to work and fix the warning:
Right, this fixes the warning (after fixing the minor compile issue, see
below).
Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>
> diff --git a/drivers/net/ethernet/smsc/smsc911x.c
> b/drivers/net/ethernet/smsc/smsc911x.c
> index 3bf20211cceb..e9c0668a4dc0 100644
> --- a/drivers/net/ethernet/smsc/smsc911x.c
> +++ b/drivers/net/ethernet/smsc/smsc911x.c
> @@ -1037,6 +1037,8 @@ static int smsc911x_mii_probe(struct net_device
> *dev)
> return ret;
> }
>
> + /* Indicate that the MAC is responsible for managing PHY PM */
> + phydev->mac_managed_pm = true;
> phy_attached_info(phydev);
>
> phy_set_max_speed(phydev, SPEED_100);
> @@ -2587,6 +2589,8 @@ static int smsc911x_suspend(struct device *dev)
> if (netif_running(ndev)) {
> netif_stop_queue(ndev);
> netif_device_detach(ndev);
> + if (!device_may_wakeup(dev))
> + phy_suspend(dev->phydev);
phy_suspend(ndev->phydev);
> }
>
> /* enable wake on LAN, energy detection and the external PME
> @@ -2628,6 +2632,8 @@ static int smsc911x_resume(struct device *dev)
> if (netif_running(ndev)) {
> netif_device_attach(ndev);
> netif_start_queue(ndev);
> + if (!device_may_wakeup(dev))
> + phy_resume(dev->phydev);
phy_resume(ndev->phydev);
> }
>
> return 0;
>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland