Re: [RFC PATCH net] Revert "net: phy: Uniform PHY driver access"

From: Vladimir Oltean
Date: Sun Sep 12 2021 - 17:39:23 EST


On Sun, Sep 12, 2021 at 10:49:25PM +0200, Gerhard Engleder wrote:
> > This reverts commit 3ac8eed62596387214869319379c1fcba264d8c6.
> >
> > I am not actually sure I follow the patch author's logic, because the
> > change does more than it says on the box, but this patch breaks
> > suspend/resume on NXP LS1028A and probably on any other systems which
> > have PHY devices with no driver bound, because the patch has removed the
> > "!phydev->drv" check without actually explaining why that is fine.
>
> The wrong assumption was that the driver is set for every device during probe
> before suspend. Intention of the patch was only clean up of
> to_phy_driver() usage.

I am not sure why "to_phy_driver" needs cleanup. Au contraire, I think
the PHY library's usage of struct phy_device :: drv is what is strange
and potentially buggy, it is the only subsystem I know of that keeps its
own driver pointer rather than looking at struct device :: driver.
I think this is largely for historical reasons (it has done this since
the first commit), but it looks to me like to_phy_driver could be used
as part of a larger macro called something like phydev_get_drv which
retrieves the phy_driver from the phydev->mdio.dev.driver.

I say it is buggy because when probing fails ("fails" includes things
like -EPROBE_DEFER) it does not even bother to clear phydev->drv back to
NULL, even though the device will not have a driver pointer. There are
also other things which it does not clean up on probe failure, btw, each
with its own interesting side effects.

> > static bool mdio_bus_phy_may_suspend(struct phy_device *phydev)
> > {
> > + struct device_driver *drv = phydev->mdio.dev.driver;
> > + struct phy_driver *phydrv = to_phy_driver(drv);
> > struct net_device *netdev = phydev->attached_dev;
> >
> > - if (!phydev->drv->suspend)
> > + if (!drv || !phydrv->suspend)
> > return false;
> >
> > /* PHY not attached? May suspend if the PHY has not already been
>
> I suggest to add the "!phydev->drv" check, but others may know it
> better than me.

So in this case, the difference will be that with your change, a
phy_probe that returns an error code like -EPROBE_DEFER will have the
phydev->drv set, and it will not return false ("may not suspend") quickly,
while the code in its original form will not attempt to suspend that PHY.

The implication is that we may call the ->suspend method of a PHY that
is deferring probe, _before_ the probe has actually succeeded.

To me, making that change and moving the code in yet a third state is
way outside of the scope, which was to restore it to a known working
condition (aka bug fix). If you want to make that change, feel free, I will not.