Re: [PATCH 1/1] sofware node: Only the managing device can unreference managed software node

From: Mike Isely

Date: Wed Feb 25 2026 - 15:17:07 EST


On Wed, 25 Feb 2026, Andy Shevchenko wrote:

> On Wed, Feb 25, 2026 at 01:42:30PM -0600, Mike Isely wrote:
> > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > On Tue, Feb 24, 2026 at 01:19:22PM -0600, mike.isely@xxxxxxxxxxxxxxxxx wrote:
>
> ...
>
> > > > A scenario exists where device_create_managed_software_node() is used
> > > > to create an swnode instance that will be implicitly shared to a child
> > > > device despite best intentions not to permit such sharing (per the
> > > > comment in device_create_managed_software_node()). I encountered this
> > > > with the sfp kernel module when it was instantiated with properties
> > >
> > > SFP? Or is it the name of the actual module in the kernel?
> >
> > Actual kernel module name, sfp.ko, CONFIG_SFP in .config, named after
> > the piece of hardware it works with, an SFP cage. This is logic which
> > monitors SFP cages for hotplug appearance / removal of SFP transceivers.
> > When a transceiver appears, the sfp kernel module will create a child
> > hwmon device instance to monitor various bits of metadata from the
> > transceiver. When that transceiver goes away, the sfp kernel module
> > will tear down that child hwmon device instance.
> >
> > The sfp kernel module needs resources configured to know where to
> > monitor; in our case that is set up dynamically by another locally
> > written kernel module (which iteracts with an FPGA we have where the SFP
> > hardware elements reside), and that kernel module will combine
>
> > devicetree information with some run-time information to generate the
> > properties handed off to the sfp kernel module instantiation.
>
> What runtime information? Why this can't be done via DT overlay as others do?

I don't recall the specifics. It might be calculation of a unit name.
The connectivity in this case is I2C so that should be a constant. We
have some variants where the FPGA is PCIE-connected to the host and so
the memory map is a run-time calculation. We have other drivers that
have to be instantiated with run-time computed properties. So we handle
this as a general case.


>
> > We use platform_device_register_full() to instantiate it, and that in turn
> > causes the swnode instance to be created. The hwmon child instance
> > later created by the sfp module ends up inheriting those resources,
> > including the swnode instance, from the sfp parent device, and when the
> > hwmon instance is later torn down, we end up with the use-after-free
> > problem due to the swnode instance's reference count being incorrectly
> > decremented by that child device due to the managed flag being set.
> >
> > If the term SFP is unfamiliar, an explanation can be found here:
> >
> > https://en.wikipedia.org/wiki/Small_Form-factor_Pluggable
> >
> > > > via a call to platform_device_register_full() - it will create hwmon
> > > > child devices which get all property references. Unfortunately with
> > > > just a "managed" boolean in struct swnode handling this, then
> > > > kobject_put() gets called for the managed aspect when the child device
> > > > goes away instead of the parent. This leads to premature freeing of
> > > > the swnode structure, followed by use-after-free problems, heap
> > > > corruption, and generally chaos / crashes / misbehavior in the kernel.
> > > >
> > > > This commit changes that boolean into a pointer to the actual managing
> > > > struct device, which is then checked against the struct device
> > > > instance that is actually going away (via the usual call back into
> > > > software_node_notify_remove()). Thus the child device removal is
> > > > ignored as it should, and we only do the kobject_put() when the actual
> > > > managing struct device instance goes away. We effectively carry a
> > > > little bit more information now so that we can be sure to clean up
> > > > only when the correct struct device instance is actually going away.
> > > >
> > > > Note that while we are now keeping a pointer to a struct device here,
> > > > this is safe to do because the pointer itself only stays in use while
> > > > the pointed-to device remains valid. (So no need to be concerned
> > > > about additional reference counting.)
> > >
> > > The term is called "object lifetime".
> >
> > Meh
> >
> > > > Signed-off-by: Mike Isely <isely@xxxxxxxxx>
> > >
> > > ...submitter's addresses are different. Either it should be send from the
> > > mentioned address, or you should have
> > >
> > > From: Author <...>
> > > ...
> > >
> > > SoB: Author <...>
> > > SoB: Submitter <...>
> > >
> > > ...
> >
> > This isn't the first time I've submitted patches.
> >
> > The reason for SoB being isely@xxxxxxxxx is because that is my primary
> > email address for which I've been known by in the kernel community.
> > It's an address that is stable and has been associated with me since
> > 1996. This address has a history and is associated with patches in the
> > past going back a long ways.
> >
> > The from-address of mike.isely@xxxxxxxxxxxxxxxxx is my current work
> > email, which is associated with the reason for the patch. My employment
> > with Cobalt Digital goes back just a few years but my history with the
> > Linux kernel goes back significantly further.
> >
> > It's actually been a while since I sent in a patch. Last time I did,
> > this is how I did it and it wasn't a problem then.
>
> This is standard check in Linux Next, missed SoB by submitter (same
> exists with committer). It's all documented and I'm surprised nobody
> told you that before.
>
> > I don't recall
> > seeing anything that required the from-address to equate to the SoB.
>
> From goes to Author field into Git, Author must provide a DCO. This all
> is being required from day 1.
>
> > Seems like a SoB associated with a long-term known stable email is
> > better than something more ephemeral. If a requirement of equating the
> > two is the case now, then I apologize.
>
> Documentation/process/submitting-patches.rst:
>
> "Note, the From: tag is optional when the From: author is also the person (and
> email) listed in the From: line of the email header."
>
> There are also examples
>
> """
> From: From Author <from@xxxxxxxxxxxxxxxxxx>
>
> <changelog>
>
> Co-developed-by: Random Co-Author <random@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Random Co-Author <random@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: From Author <from@xxxxxxxxxxxxxxxxxx>
> Co-developed-by: Submitting Co-Author <sub@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Submitting Co-Author <sub@xxxxxxxxxxxxxxxxxxxx>
> """
>
> Does it help?

Would SoB lines by *both* addresses (even though it's the same person)
clear that hurdle?

So noted.


>
> > > What about the use case (don't know if it's pure theoretical or practical)
> > > when there is a parent and a few children and the managed swnode appears
> > > in one of the children? With some other dependencies between children
> > > it might affect how swnode is get cleaned up. Simple and regular approach
> > > is to cleanup children in the reversed order, but I can't say that it's
> > > always the case. Some children in some corner cases might have their own
> > > dependencies (I saw some strange devices or device drivers where the HW
> > > is a bit complicated and driver is written without much care).
> >
> > Though I don't think that's practical, it should not be an issue either.
> > Bottom line, swnode instances are reference-counted by virtue of
> > containing a kobject instance within. So even if a managing child goes
> > away first, the reference count is still non-zero due to the reference
> > from the parent. So no risk of dangling pointers / use-after-free.
> > Note also that management of object lifetime via reference counting here
> > provides a fair amount of resilience against potential cleanup ordering
> > issues.
> >
> > An earlier attempt at a solution I had tried was to simply dispense with
> > the notion of "managed" here since the reference counting is already
> > going to ensure cleanup when the device goes away. "Managed" in the
> > context of struct device are for resources that need automatic cleanup
> > which otherwise wouldn't already happen. So it shouldn't be strictly
> > even needed here. However that didn't work due to a practical
> > constraint: The swnode instance is created before the "managing" device
> > instance is ready to establish a reference to the swnode instance.
> > Turns out that the solution I ultimately hit upon is trivial.
> >
> > From a standpoint of design hygiene, it makes logical sense that the
> > entity which asserts "management" over something should be the same
> > entity that deasserts it. That's really all that this patch does.
>
> OK. Thanks for elaboration.
>
>