Re: [PATCH 1/1] sofware node: Only the managing device can unreference managed software node

From: Mike Isely

Date: Wed Feb 25 2026 - 14:42:48 EST


On Wed, 25 Feb 2026, Andy Shevchenko wrote:

> On Tue, Feb 24, 2026 at 01:19:22PM -0600, mike.isely@xxxxxxxxxxxxxxxxx wrote:
> > From: Mike Isely <mike.isely@xxxxxxxxxxxxxxxxx>
>
> Author's and...
>
> > A scenario exists where device_create_managed_software_node() is used
> > to create an swnode instance that will be implicitly shared to a child
> > device despite best intentions not to permit such sharing (per the
> > comment in device_create_managed_software_node()). I encountered this
> > with the sfp kernel module when it was instantiated with properties
>
> SFP? Or is it the name of the actual module in the kernel?

Actual kernel module name, sfp.ko, CONFIG_SFP in .config, named after
the piece of hardware it works with, an SFP cage. This is logic which
monitors SFP cages for hotplug appearance / removal of SFP transceivers.
When a transceiver appears, the sfp kernel module will create a child
hwmon device instance to monitor various bits of metadata from the
transceiver. When that transceiver goes away, the sfp kernel module
will tear down that child hwmon device instance.

The sfp kernel module needs resources configured to know where to
monitor; in our case that is set up dynamically by another locally
written kernel module (which iteracts with an FPGA we have where the SFP
hardware elements reside), and that kernel module will combine
devicetree information with some run-time information to generate the
properties handed off to the sfp kernel module instantiation. We use
platform_device_register_full() to instantiate it, and that in turn
causes the swnode instance to be created. The hwmon child instance
later created by the sfp module ends up inheriting those resources,
including the swnode instance, from the sfp parent device, and when the
hwmon instance is later torn down, we end up with the use-after-free
problem due to the swnode instance's reference count being incorrectly
decremented by that child device due to the managed flag being set.

If the term SFP is unfamiliar, an explanation can be found here:

https://en.wikipedia.org/wiki/Small_Form-factor_Pluggable


>
> > via a call to platform_device_register_full() - it will create hwmon
> > child devices which get all property references. Unfortunately with
> > just a "managed" boolean in struct swnode handling this, then
> > kobject_put() gets called for the managed aspect when the child device
> > goes away instead of the parent. This leads to premature freeing of
> > the swnode structure, followed by use-after-free problems, heap
> > corruption, and generally chaos / crashes / misbehavior in the kernel.
> >
> > This commit changes that boolean into a pointer to the actual managing
> > struct device, which is then checked against the struct device
> > instance that is actually going away (via the usual call back into
> > software_node_notify_remove()). Thus the child device removal is
> > ignored as it should, and we only do the kobject_put() when the actual
> > managing struct device instance goes away. We effectively carry a
> > little bit more information now so that we can be sure to clean up
> > only when the correct struct device instance is actually going away.
> >
> > Note that while we are now keeping a pointer to a struct device here,
> > this is safe to do because the pointer itself only stays in use while
> > the pointed-to device remains valid. (So no need to be concerned
> > about additional reference counting.)
>
> The term is called "object lifetime".

Meh


>
> > Signed-off-by: Mike Isely <isely@xxxxxxxxx>
>
> ...submitter's addresses are different. Either it should be send from the
> mentioned address, or you should have
>
> From: Author <...>
> ...
>
> SoB: Author <...>
> SoB: Submitter <...>
>
> ...

This isn't the first time I've submitted patches.

The reason for SoB being isely@xxxxxxxxx is because that is my primary
email address for which I've been known by in the kernel community.
It's an address that is stable and has been associated with me since
1996. This address has a history and is associated with patches in the
past going back a long ways.

The from-address of mike.isely@xxxxxxxxxxxxxxxxx is my current work
email, which is associated with the reason for the patch. My employment
with Cobalt Digital goes back just a few years but my history with the
Linux kernel goes back significantly further.

It's actually been a while since I sent in a patch. Last time I did,
this is how I did it and it wasn't a problem then. I don't recall
seeing anything that required the from-address to equate to the SoB.
Seems like a SoB associated with a long-term known stable email is
better than something more ephemeral. If a requirement of equating the
two is the case now, then I apologize.


>
> What about the use case (don't know if it's pure theoretical or practical)
> when there is a parent and a few children and the managed swnode appears
> in one of the children? With some other dependencies between children
> it might affect how swnode is get cleaned up. Simple and regular approach
> is to cleanup children in the reversed order, but I can't say that it's
> always the case. Some children in some corner cases might have their own
> dependencies (I saw some strange devices or device drivers where the HW
> is a bit complicated and driver is written without much care).

Though I don't think that's practical, it should not be an issue either.
Bottom line, swnode instances are reference-counted by virtue of
containing a kobject instance within. So even if a managing child goes
away first, the reference count is still non-zero due to the reference
from the parent. So no risk of dangling pointers / use-after-free.
Note also that management of object lifetime via reference counting here
provides a fair amount of resilience against potential cleanup ordering
issues.

An earlier attempt at a solution I had tried was to simply dispense with
the notion of "managed" here since the reference counting is already
going to ensure cleanup when the device goes away. "Managed" in the
context of struct device are for resources that need automatic cleanup
which otherwise wouldn't already happen. So it shouldn't be strictly
even needed here. However that didn't work due to a practical
constraint: The swnode instance is created before the "managing" device
instance is ready to establish a reference to the swnode instance.
Turns out that the solution I ultimately hit upon is trivial.

>From a standpoint of design hygiene, it makes logical sense that the
entity which asserts "management" over something should be the same
entity that deasserts it. That's really all that this patch does.

-Mike
isely@xxxxxxxxx
mike.isely@xxxxxxxxxxxxxxxxx