Re: [PATCH] net: ethernet: nxp: Fix a possible memory leak in lpc_mii_probe()

From: Andrew Lunn
Date: Mon Sep 09 2024 - 08:54:46 EST


On Mon, Sep 09, 2024 at 05:29:48PM +0800, Jinjie Ruan wrote:
> of_phy_find_device() calls bus_find_device(), which calls get_device()
> on the returned struct device * to increment the refcount. The current
> implementation does not decrement the refcount, which causes memory leak.
>
> So add the missing phy_device_free() call to decrement the
> refcount via put_device() to balance the refcount.

Why is a device reference counted?

To stop is disappearing.

> @@ -768,6 +768,9 @@ static int lpc_mii_probe(struct net_device *ndev)
> return -ENODEV;
> }
>
> + if (pldat->phy_node)
> + phy_device_free(phydev);
> +
> phydev = phy_connect(ndev, phydev_name(phydev),
> &lpc_handle_link_change,
> lpc_phy_interface_mode(&pldat->pdev->dev));

Think about this code. We use of_phy_find_device to get the device,
taking a reference on it. While we hold that reference, we know it
cannot disappear and we passed it to phy_connect(), passing it into
the phylib layer. Deep down, phy_attach_direct() is called which does
a get_device() taking a reference on the device. That is the phylib
layer saying it is using it, it does not want it to disappear.

Now think about your change. As soon as you new phy_device_free() is
called, the device can disappear. phylib is then going to use
something which has gone. Bad things will happen.

So with changes like this, you need to think about lifetimes of things
being protected by a reference count. When has lpc_mii_probe(), or the
lpc driver as a whole finished with phydev? There are two obvious
alternatives i can think of.

1) It wants to keep hold of the reference until the driver remove() is
called, so you should be releasing the reference in
lpc_eth_drv_remove().

2) Once the phydev is passed to the phylib layer for it to manage,
this driver does not need to care about it any more. So it just needs
to hold the reference until after phy_connect() returns.

Memory leaks are an annoyance, but generally have little effect,
especially in probe/remove code which gets called once. Accessing
something which has gone is going to cause an Opps.

So, you need to think about the lifetime of objects you are
manipulating the reference counts on. You want to state in the commit
message your understanding of these lifetimes so the reviewer can
sanity check them.

FYI: Ignore anything you have learned while fixing device tree
reference counting bugs. Lifetimes of OF objects is probably very
broken.

Andrew

---
pw-bot: cr