Re: [PATCH net-next v3] net: phy: Fix suspicious rcu_dereference usage
From: Paul Barker
Date: Tue Jan 21 2025 - 06:35:16 EST
On 21/01/2025 09:38, Kory Maincent wrote:
> On Mon, 20 Jan 2025 11:12:28 -0800
> Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
>> On Mon, 20 Jan 2025 15:19:25 +0100 Kory Maincent wrote:
>>> The path reported to not having RTNL lock acquired is the suspend path of
>>> the ravb MAC driver. Without this fix we got this warning:
>>
>> I maintain that ravb is buggy, plenty of drivers take rtnl_lock
>> from the .suspend callback. We need _some_ write protection here,
>> the patch as is only silences a legitimate warning.
>
> Indeed if the suspend path is buggy we should fix it. Still there is lots of
> ethernet drivers calling phy_disconnect without rtnl (IIUC) if probe return an
> error or in the remove path. What should we do about it?
>
> About ravb suspend, I don't have the board, Claudiu could you try this instead
> of the current fix:
>
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c
> b/drivers/net/ethernet/renesas/ravb_main.c index bc395294a32d..c9a0d2d6f371
> 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -3215,15 +3215,22 @@ static int ravb_suspend(struct device *dev)
> if (!netif_running(ndev))
> goto reset_assert;
>
> + rtnl_lock();
> netif_device_detach(ndev);
>
> - if (priv->wol_enabled)
> - return ravb_wol_setup(ndev);
> + if (priv->wol_enabled) {
> + ret = ravb_wol_setup(ndev);
> + rtnl_unlock();
> + return ret;
> + }
>
> ret = ravb_close(ndev);
> - if (ret)
> + if (ret) {
> + rtnl_unlock();
> return ret;
> + }
>
> + rtnl_unlock();
> ret = pm_runtime_force_suspend(&priv->pdev->dev);
> if (ret)
> return ret;
>
> Regards,
(Cc'ing Niklas and Sergey as this relates to the ravb driver)
Why do we need to hold the rtnl mutex across the calls to
netif_device_detach() and ravb_wol_setup()?
My reading of Documentation/networking/netdevices.rst is that the rtnl
mutex is held when the net subsystem calls the driver's ndo_stop method,
which in our case is ravb_close(). So, we should take the rtnl mutex
when we call ravb_close() directly, in both ravb_suspend() and
ravb_wol_restore(). That would ensure that we do not call
phy_disconnect() without holding the rtnl mutex and should fix this
issue.
Commit 35f7cad1743e ("net: Add the possibility to support a selected
hwtstamp in netdevice") may have unearthed the issue, but the fixes tag
should point to the commits adding those unlocked calls to ravb_close().
I am not super familiar with the rtnl lock so let me know if I've missed
something.
Thanks,
--
Paul BarkerAttachment:
OpenPGP_0x27F4B3459F002257.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature