Re: [PATCH net-next v3] net: phy: Fix suspicious rcu_dereference usage
From: Paul Barker
Date: Wed Jan 22 2025 - 09:04:24 EST
On 21/01/2025 16:11, Kory Maincent wrote:
> On Tue, 21 Jan 2025 15:44:34 +0000
> Paul Barker <paul.barker.ct@xxxxxxxxxxxxxx> wrote:
>
>> On 21/01/2025 13:01, Kory Maincent wrote:
>>
>>> On Tue, 21 Jan 2025 11:34:48 +0000
>>> Paul Barker <paul.barker.ct@xxxxxxxxxxxxxx> wrote:
>>>
>>>> Why do we need to hold the rtnl mutex across the calls to
>>>> netif_device_detach() and ravb_wol_setup()?
>>>>
>>>> My reading of Documentation/networking/netdevices.rst is that the rtnl
>>>> mutex is held when the net subsystem calls the driver's ndo_stop method,
>>>> which in our case is ravb_close(). So, we should take the rtnl mutex
>>>> when we call ravb_close() directly, in both ravb_suspend() and
>>>> ravb_wol_restore(). That would ensure that we do not call
>>>> phy_disconnect() without holding the rtnl mutex and should fix this
>>>> issue.
>>>
>>> Not sure about it. For example ravb_ptp_stop() called in ravb_wol_setup()
>>> won't be protected by the rtnl lock.
>>
>> ravb_ptp_stop() modifies a couple of device registers and calls
>> ptp_clock_unregister(). I don't see anything to suggest that this
>> requires the rtnl lock to be held, unless I am missing something.
>
> What happens if two ptp_clock_unregister() with the same ptp_clock pointer are
> called simultaneously? From ravb_suspend and ravb_set_ringparam for example. It
> may cause some errors.
> For example the ptp->kworker pointer could be used after a kfree.
> https://elixir.bootlin.com/linux/v6.12.6/source/drivers/ptp/ptp_clock.c#L416
I've dug into this some more today and looked at the suspend/resume
paths of other Ethernet drivers for comparison.
netif_device_detach() and netif_device_attach() seem to be safe to call
without holding the rtnl lock, e.g. the stmmac driver does this.
In the suspend path, we should hold the rtnl lock across the calls to
ravb_wol_setup() and ravb_close().
In the resume path, we should hold the rtnl lock across the calls to
ravb_wol_restore() and ravb_open(). I don't think there is any harm in
holding the rtnl lock while we call pm_runtime_force_resume(), so we can
take the lock before checking priv->wol_enabled and release after
calling ravb_open().
How does that sound?
Thanks,
--
Paul BarkerAttachment:
OpenPGP_0x27F4B3459F002257.asc
Description: OpenPGP public key
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature