Re: [PATCH v2 01/15] ARM: actions: fix a leaked reference by adding missing of_node_put

From: Russell King - ARM Linux admin
Date: Sat Mar 09 2019 - 03:27:10 EST


On Sat, Mar 09, 2019 at 07:47:42AM +0530, Manivannan Sadhasivam wrote:
> Hi Russel,
>
> On Tue, Mar 05, 2019 at 11:40:48AM +0000, Russell King - ARM Linux admin wrote:
> > On Tue, Mar 05, 2019 at 07:33:52PM +0800, Wen Yang wrote:
> > > The call to of_get_next_child returns a node pointer with refcount
> > > incremented thus it must be explicitly decremented after the last
> > > usage.
> > >
> > > Detected by coccinelle with the following warnings:
> > > ./arch/arm/mach-actions/platsmp.c:112:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 103, but without a corresponding object release within this function.
> > > ./arch/arm/mach-actions/platsmp.c:124:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 115, but without a corresponding object release within this function.
> > > ./arch/arm/mach-actions/platsmp.c:137:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 128, but without a corresponding object release within this function.
> >
> > I question this. Your reasoning is that the node is no longer used
> > so the reference count needs to be put.
> >
> > However, in all these cases, data is read from the nodes properties
> > and the device remains in-use for the life of the kernel. There is
> > a big difference here.
> >
> > With normal drivers, each device is bound to their associated device
> > node associated with the device. When the device node goes away, then
> > the corresponding device goes away too, which causes the driver to be
> > unbound from the device.
> >
> > However, there is another class of "driver" which are the ones below,
> > where they are "permanent" devices. These can never go away, even if
> > the device node refcount hits zero and the device node is freed - the
> > device is still present and in-use in the system. So, having the
> > device node refcount hit zero is actually a bug: what that's saying
> > is the system device (eg, SCU) has gone away. If you somehow were to
> > remove the SCU from the system, you'd end up severing the connection
> > between the CPU cores and the rest of the system - obviously resulting
> > in a dead system!
> >
> > So, what is the point of dropping these refcounts for devices that can
> > never go away - and thus their associated device_node should also never
> > go away?
> >
>
> Yes, practically we would never hit this case but theoretically we should
> decrement the refcount for nodes/properties whenever we are done with it.
> As you know, there are 'n' number of places in kernel where we can see the
> refcount not being put after use. So I would welcome these kind of patches
> to set an example for someone who tries to use the of_* calls in future.
>
> IMO, DT should've handled the refcount internally without exposing the
> pointers to external world.

It doesn't, that's my point.

In the case of normal drivers, there's an _extra_ refcount held by the
device that is created - see the of_node_get() in of_device_alloc().
This refcount exists for the lifetime of the device structure. That
refcount exists for the duration that the device exists, which bounds
the lifetime of the availability of the device to the driver.

In effect, while the device driver is bound, there is a refcount on
the device node. So, the device node is guaranteed to be around for
as long as the device driver is bound to the device.

For the cases being addressed in these patches, there is no driver, so
there is no bounding of the lifetime: the expectation is that the
lifetime is the duration of the kernel. If such a device node were to
be deleted, then there is no way to unbind the driver, and if we have
dropped the refcount, the device node will be immediately freed.
However, the device is still in use.

These are a different "class" of driver.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up