Re: [PATCH net v4] net: phy: Don't trigger state machine while in suspend
From: Andrew Lunn
Date: Wed Jun 29 2022 - 03:23:14 EST
On Tue, Jun 28, 2022 at 12:15:08PM +0200, Lukas Wunner wrote:
> Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(),
> but subsequent interrupts may retrigger it:
>
> They may have been left enabled to facilitate wakeup and are not
> quiesced until the ->suspend_noirq() phase. Unwanted interrupts may
> hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(),
> as well as between dpm_resume_noirq() and mdio_bus_phy_resume().
>
> Retriggering the phy_state_machine() through an interrupt is not only
> undesirable for the reason given in mdio_bus_phy_suspend() (freezing it
> midway with phydev->lock held), but also because the PHY may be
> inaccessible after it's suspended: Accesses to USB-attached PHYs are
> blocked once usb_suspend_both() clears the can_submit flag and PHYs on
> PCI network cards may become inaccessible upon suspend as well.
>
> Amend phy_interrupt() to avoid triggering the state machine if the PHY
> is suspended. Signal wakeup instead if the attached net_device or its
> parent has been configured as a wakeup source. (Those conditions are
> identical to mdio_bus_phy_may_suspend().) Postpone handling of the
> interrupt until the PHY has resumed.
>
> Before stopping the phy_state_machine() in mdio_bus_phy_suspend(),
> wait for a concurrent phy_interrupt() to run to completion. That is
> necessary because phy_interrupt() may have checked the PHY's suspend
> status before the system sleep transition commenced and it may thus
> retrigger the state machine after it was stopped.
>
> Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(),
> wait for a concurrent phy_interrupt() to complete to ensure that
> interrupts which it postponed are properly rerun.
>
> The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward
> PHY interrupts to PHY driver to avoid polling"), but has existed since
> forever.
>
> Fixes: 541cd3ee00a4 ("phylib: Fix deadlock on resume")
> Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@xxxxxxxxxxx/
> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # v2.6.33+
Reviewed-by: Andrew Lunn <andrew@xxxxxxx>
Andrew