Re: [PATCH] net: phy: phy_device: fix PHY WOL enabled, PM failed to suspend

From: Florian Fainelli
Date: Fri Jun 28 2024 - 04:26:09 EST




On 6/28/2024 9:17 AM, Russell King (Oracle) wrote:
On Fri, Jun 28, 2024 at 02:03:18PM +0800, Youwan Wang wrote:
If the PHY of the mido bus is enabled with Wake-on-LAN (WOL),
we cannot suspend the PHY. Although the WOL status has been
checked in phy_suspend(), returning -EBUSY(-16) would cause
the Power Management (PM) to fail to suspend. Since
phy_suspend() is an exported symbol (EXPORT_SYMBOL),
timely error reporting is needed. Therefore, an additional
check is performed here. If the PHY of the mido bus is enabled
with WOL, we skip calling phy_suspend() to avoid PM failure.

log:
[ 322.631362] OOM killer disabled.
[ 322.631364] Freezing remaining freezable tasks
[ 322.632536] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 322.632540] printk: Suspending console(s) (use no_console_suspend to debug)
[ 322.633052] YT8521 Gigabit Ethernet stmmac-0:01:
PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x110 [libphy] returns -16
[ 322.633071] YT8521 Gigabit Ethernet stmmac-0:01:
PM: failed to suspend: error -16
[ 322.669699] PM: Some devices failed to suspend, or early wake event detected
[ 322.669949] OOM killer enabled.
[ 322.669951] Restarting tasks ... done.
[ 322.671008] random: crng reseeded on system resumption
[ 322.671014] PM: suspend exit

If the YT8521 driver adds phydrv->flags, ask the YT8521 driver to process
WOL at suspend and resume time, the phydev->suspended_by_mdio_bus=1
flag would cause the resume failure.

Did you mean to write that if the YT8521 PHY driver entry set the PHY_ALWAYS_CALL_SUSPEND flag, then it would cause an error during resume? If so, why is that?


I think the reason this is happening is because the PHY has WoL enabled
on it without the kernel/netdev driver being aware that WoL is enabled.
Thus, mdio_bus_phy_may_suspend() returns true, allowing the suspend to
happen, but then we find unexpectedly that WoL is enabled on the PHY.

However, whenever a user configures WoL, netdev->wol_enabled will be
set when _any_ WoL mode is enabled and cleared only if all WoL modes
are disabled.

Thus, what we have is a de-sync between the kernel state and hardware
state, leading to the suspend failing.

I don't see anything in the motorcomm driver that requires suspend
if WoL is enabled - yt8521_suspend() first checks to see whether WoL
is enabled, and exits if it is.

Andrew - how do you feel about reading the WoL state from the PHY and
setting netdev->wol_enabled if any WoL is enabled on the PHY? That
would mean that the netdev's WoL state is consistent with the PHY
whether or not the user has configured WoL.

Would not the situation described here be solved by having the Motorcomm PHY driver set PHY_ALWAYS_CALL_SUSPEND since it deals with checking whether WoL is enabled or not and will just return then.
--
Florian