Re: [PATCH net-next v3] net: phy: Fix suspicious rcu_dereference usage
From: Kory Maincent
Date: Tue Jan 21 2025 - 08:02:41 EST
On Tue, 21 Jan 2025 11:34:48 +0000
Paul Barker <paul.barker.ct@xxxxxxxxxxxxxx> wrote:
> On 21/01/2025 09:38, Kory Maincent wrote:
> > On Mon, 20 Jan 2025 11:12:28 -0800
> > Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> >> On Mon, 20 Jan 2025 15:19:25 +0100 Kory Maincent wrote:
> [...]
> >>
> >> I maintain that ravb is buggy, plenty of drivers take rtnl_lock
> >> from the .suspend callback. We need _some_ write protection here,
> >> the patch as is only silences a legitimate warning.
> >
> > Indeed if the suspend path is buggy we should fix it. Still there is lots of
> > ethernet drivers calling phy_disconnect without rtnl (IIUC) if probe return
> > an error or in the remove path. What should we do about it?
> >
> > About ravb suspend, I don't have the board, Claudiu could you try this
> > instead of the current fix:
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c
> > b/drivers/net/ethernet/renesas/ravb_main.c index bc395294a32d..c9a0d2d6f371
> > 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -3215,15 +3215,22 @@ static int ravb_suspend(struct device *dev)
> > if (!netif_running(ndev))
> > goto reset_assert;
> >
> > + rtnl_lock();
> > netif_device_detach(ndev);
> >
> > - if (priv->wol_enabled)
> > - return ravb_wol_setup(ndev);
> > + if (priv->wol_enabled) {
> > + ret = ravb_wol_setup(ndev);
> > + rtnl_unlock();
> > + return ret;
> > + }
> >
> > ret = ravb_close(ndev);
> > - if (ret)
> > + if (ret) {
> > + rtnl_unlock();
> > return ret;
> > + }
> >
> > + rtnl_unlock();
> > ret = pm_runtime_force_suspend(&priv->pdev->dev);
> > if (ret)
> > return ret;
> >
> > Regards,
>
> (Cc'ing Niklas and Sergey as this relates to the ravb driver)
Yes, thanks.
> Why do we need to hold the rtnl mutex across the calls to
> netif_device_detach() and ravb_wol_setup()?
>
> My reading of Documentation/networking/netdevices.rst is that the rtnl
> mutex is held when the net subsystem calls the driver's ndo_stop method,
> which in our case is ravb_close(). So, we should take the rtnl mutex
> when we call ravb_close() directly, in both ravb_suspend() and
> ravb_wol_restore(). That would ensure that we do not call
> phy_disconnect() without holding the rtnl mutex and should fix this
> issue.
Not sure about it. For example ravb_ptp_stop() called in ravb_wol_setup() won't
be protected by the rtnl lock.
I don't know about netif_device_detach(). It doesn't seems to be the case as
there is lots of driver using it without holding rtnl lock.
Indeed we should add the rtnl lock also in the resume path.
> Commit 35f7cad1743e ("net: Add the possibility to support a selected
> hwtstamp in netdevice") may have unearthed the issue, but the fixes tag
> should point to the commits adding those unlocked calls to ravb_close().
The current patch was on phy_device.c that's why the fixes tag does not point to
a ravb commit, it will change.
Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com