Re: [PATCH] net: ethernet: nxp: Fix a possible memory leak in lpc_mii_probe()

From: Jinjie Ruan
Date: Mon Sep 09 2024 - 09:17:55 EST




On 2024/9/9 20:54, Andrew Lunn wrote:
> On Mon, Sep 09, 2024 at 05:29:48PM +0800, Jinjie Ruan wrote:
>> of_phy_find_device() calls bus_find_device(), which calls get_device()
>> on the returned struct device * to increment the refcount. The current
>> implementation does not decrement the refcount, which causes memory leak.
>>
>> So add the missing phy_device_free() call to decrement the
>> refcount via put_device() to balance the refcount.
>
> Why is a device reference counted?
>
> To stop is disappearing.
>
>> @@ -768,6 +768,9 @@ static int lpc_mii_probe(struct net_device *ndev)
>> return -ENODEV;
>> }
>>
>> + if (pldat->phy_node)
>> + phy_device_free(phydev);
>> +
>> phydev = phy_connect(ndev, phydev_name(phydev),
>> &lpc_handle_link_change,
>> lpc_phy_interface_mode(&pldat->pdev->dev));
>
> Think about this code. We use of_phy_find_device to get the device,
> taking a reference on it. While we hold that reference, we know it
> cannot disappear and we passed it to phy_connect(), passing it into
> the phylib layer. Deep down, phy_attach_direct() is called which does
> a get_device() taking a reference on the device. That is the phylib
> layer saying it is using it, it does not want it to disappear.
>
> Now think about your change. As soon as you new phy_device_free() is
> called, the device can disappear. phylib is then going to use
> something which has gone. Bad things will happen.

Hi, Andrew,
Thank you to share me your a wealth of relevant knowledge. My knowledge
of reference count is relatively shallow.

>
> So with changes like this, you need to think about lifetimes of things
> being protected by a reference count. When has lpc_mii_probe(), or the
> lpc driver as a whole finished with phydev? There are two obvious
> alternatives i can think of.
>
> 1) It wants to keep hold of the reference until the driver remove() is
> called, so you should be releasing the reference in
> lpc_eth_drv_remove().
>
> 2) Once the phydev is passed to the phylib layer for it to manage,
> this driver does not need to care about it any more. So it just needs
> to hold the reference until after phy_connect() returns.

I think this is a good chance to put the device after phy_connect().

>
> Memory leaks are an annoyance, but generally have little effect,
> especially in probe/remove code which gets called once. Accessing
> something which has gone is going to cause an Opps.
>
> So, you need to think about the lifetime of objects you are
> manipulating the reference counts on. You want to state in the commit
> message your understanding of these lifetimes so the reviewer can
> sanity check them.>
> FYI: Ignore anything you have learned while fixing device tree
> reference counting bugs. Lifetimes of OF objects is probably very
> broken.
>
> Andrew
>
> ---
> pw-bot: cr