Re: [PATCH 1/1] sofware node: Only the managing device can unreference managed software node
From: Andy Shevchenko
Date: Thu Feb 26 2026 - 02:17:32 EST
On Wed, Feb 25, 2026 at 02:16:39PM -0600, Mike Isely wrote:
> On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > On Wed, Feb 25, 2026 at 01:42:30PM -0600, Mike Isely wrote:
> > > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > > On Tue, Feb 24, 2026 at 01:19:22PM -0600, mike.isely@xxxxxxxxxxxxxxxxx wrote:
...
> > > > > A scenario exists where device_create_managed_software_node() is used
> > > > > to create an swnode instance that will be implicitly shared to a child
> > > > > device despite best intentions not to permit such sharing (per the
> > > > > comment in device_create_managed_software_node()). I encountered this
> > > > > with the sfp kernel module when it was instantiated with properties
> > > >
> > > > SFP? Or is it the name of the actual module in the kernel?
> > >
> > > Actual kernel module name, sfp.ko, CONFIG_SFP in .config, named after
> > > the piece of hardware it works with, an SFP cage. This is logic which
> > > monitors SFP cages for hotplug appearance / removal of SFP transceivers.
> > > When a transceiver appears, the sfp kernel module will create a child
> > > hwmon device instance to monitor various bits of metadata from the
> > > transceiver. When that transceiver goes away, the sfp kernel module
> > > will tear down that child hwmon device instance.
> > >
> > > The sfp kernel module needs resources configured to know where to
> > > monitor; in our case that is set up dynamically by another locally
> > > written kernel module (which iteracts with an FPGA we have where the SFP
> > > hardware elements reside), and that kernel module will combine
> >
> > > devicetree information with some run-time information to generate the
> > > properties handed off to the sfp kernel module instantiation.
> >
> > What runtime information? Why this can't be done via DT overlay as others do?
>
> I don't recall the specifics. It might be calculation of a unit name.
> The connectivity in this case is I2C so that should be a constant. We
> have some variants where the FPGA is PCIE-connected to the host and so
> the memory map is a run-time calculation. We have other drivers that
> have to be instantiated with run-time computed properties. So we handle
> this as a general case.
But the configurations are semi-static, right? For the contents of FPGA we have
a specific manager that reloads the FPGA configuration.
Using swnode for dynamically calculated data seems weird. The data in swnodes
is usually static (const), I can't remember the case where we need to supply
run-time calculated values.
Since that, DT overlay approach seems fine, no?
> > > We use platform_device_register_full() to instantiate it, and that in turn
> > > causes the swnode instance to be created. The hwmon child instance
> > > later created by the sfp module ends up inheriting those resources,
> > > including the swnode instance, from the sfp parent device, and when the
> > > hwmon instance is later torn down, we end up with the use-after-free
> > > problem due to the swnode instance's reference count being incorrectly
> > > decremented by that child device due to the managed flag being set.
> > >
> > > If the term SFP is unfamiliar, an explanation can be found here:
> > >
> > > https://en.wikipedia.org/wiki/Small_Form-factor_Pluggable
> > >
> > > > > via a call to platform_device_register_full() - it will create hwmon
> > > > > child devices which get all property references. Unfortunately with
> > > > > just a "managed" boolean in struct swnode handling this, then
> > > > > kobject_put() gets called for the managed aspect when the child device
> > > > > goes away instead of the parent. This leads to premature freeing of
> > > > > the swnode structure, followed by use-after-free problems, heap
> > > > > corruption, and generally chaos / crashes / misbehavior in the kernel.
> > > > >
> > > > > This commit changes that boolean into a pointer to the actual managing
> > > > > struct device, which is then checked against the struct device
> > > > > instance that is actually going away (via the usual call back into
> > > > > software_node_notify_remove()). Thus the child device removal is
> > > > > ignored as it should, and we only do the kobject_put() when the actual
> > > > > managing struct device instance goes away. We effectively carry a
> > > > > little bit more information now so that we can be sure to clean up
> > > > > only when the correct struct device instance is actually going away.
> > > > >
> > > > > Note that while we are now keeping a pointer to a struct device here,
> > > > > this is safe to do because the pointer itself only stays in use while
> > > > > the pointed-to device remains valid. (So no need to be concerned
> > > > > about additional reference counting.)
> > > >
> > > > The term is called "object lifetime".
> > >
> > > Meh
...
> Would SoB lines by *both* addresses (even though it's the same person)
> clear that hurdle?
Yes.
--
With Best Regards,
Andy Shevchenko