Re: [PATCH 1/1] sofware node: Only the managing device can unreference managed software node

From: Mike Isely

Date: Fri Feb 27 2026 - 12:56:22 EST


On Thu, 26 Feb 2026, Andy Shevchenko wrote:

> On Thu, Feb 26, 2026 at 01:06:42PM -0600, Mike Isely wrote:
> > On Thu, 26 Feb 2026, Andy Shevchenko wrote:
> > > On Wed, Feb 25, 2026 at 02:16:39PM -0600, Mike Isely wrote:
> > > > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > > > On Wed, Feb 25, 2026 at 01:42:30PM -0600, Mike Isely wrote:
> > > > > > On Wed, 25 Feb 2026, Andy Shevchenko wrote:
> > > > > > > On Tue, Feb 24, 2026 at 01:19:22PM -0600, mike.isely@xxxxxxxxxxxxxxxxx wrote:
>
> ...
>
> > > > > > > > A scenario exists where device_create_managed_software_node() is used
> > > > > > > > to create an swnode instance that will be implicitly shared to a child
> > > > > > > > device despite best intentions not to permit such sharing (per the
> > > > > > > > comment in device_create_managed_software_node()). I encountered this
> > > > > > > > with the sfp kernel module when it was instantiated with properties
> > > > > > >
> > > > > > > SFP? Or is it the name of the actual module in the kernel?
> > > > > >
> > > > > > Actual kernel module name, sfp.ko, CONFIG_SFP in .config, named after
> > > > > > the piece of hardware it works with, an SFP cage. This is logic which
> > > > > > monitors SFP cages for hotplug appearance / removal of SFP transceivers.
> > > > > > When a transceiver appears, the sfp kernel module will create a child
> > > > > > hwmon device instance to monitor various bits of metadata from the
> > > > > > transceiver. When that transceiver goes away, the sfp kernel module
> > > > > > will tear down that child hwmon device instance.
> > > > > >
> > > > > > The sfp kernel module needs resources configured to know where to
> > > > > > monitor; in our case that is set up dynamically by another locally
> > > > > > written kernel module (which iteracts with an FPGA we have where the SFP
> > > > > > hardware elements reside), and that kernel module will combine
> > > > >
> > > > > > devicetree information with some run-time information to generate the
> > > > > > properties handed off to the sfp kernel module instantiation.
> > > > >
> > > > > What runtime information? Why this can't be done via DT overlay as others do?
> > > >
> > > > I don't recall the specifics. It might be calculation of a unit name.
> > > > The connectivity in this case is I2C so that should be a constant. We
> > > > have some variants where the FPGA is PCIE-connected to the host and so
> > > > the memory map is a run-time calculation. We have other drivers that
> > > > have to be instantiated with run-time computed properties. So we handle
> > > > this as a general case.
> > >
> > > But the configurations are semi-static, right? For the contents of FPGA we have
> > > a specific manager that reloads the FPGA configuration.
> > >
> > > Using swnode for dynamically calculated data seems weird. The data in swnodes
> > > is usually static (const), I can't remember the case where we need to supply
> > > run-time calculated values.
> >
> > Semi-static means partially dynamic...
> >
> > We use the kernel FPGA manager in one platform (zynqmp) and a userspace
> > loader program in another (PCIE connected to bcm2711 soc). However it
> > is not reasonable to reload on-the-fly as that would disrupt every piece
> > of logic (kernel modules and userspace code) currently pointing inside
> > its memory space. It would require tearing down the application space
> > and unloading every kernel module using the FPGA space (some of which is
> > ours and others which are stock kernel code). Might as well reboot - so
> > we load once at boot time. If the image needs to be swapped out (and
> > there are defined cases for this), a symbolic link is updated to point
> > to the new bitfile and the processor is rebooted.
>
> To me the (full) reversing of the kernel state seems the correct approach.
> The modules not really needs to be unloaded. The driver should be unbound
> from the device.
>
> > But this is getting off-topic.
>
> Not really, it gets more understanding on why this way was chosen and not
> another.

I think there's a miscommunication here. The nature / mechanism by how
/ when we load the FPGA has exactly nothing to do with this problem.
The issue has to do with sfp kernel module behavior when SFP
transceivers are hotplugged, which is completely unrelated to FPGA
loading. The problem would be the same no matter how we manage the
FPGA.

>
> > We actually don't directly invoke the use of swnode. It's being done
> > indirectly inside of platform_device_register_full(), when its
> > pdevinfo->properties argument is non-null, which is a normal thing that
> > happens in many other invocations of that function. See
> > drivers/base/platform.c at the call to
> > device_create_managed_software_node().
>
> I'm aware of this. I reviewed a lot of swnode code.
>
> So, the properties supplied are not const, that's what you are trying to say?
> And this I believe a problem. I don't remember if we have such a case in kernel
> (when we use dynamic properties with platform_device_register_full().

In our case it's not const, but I think you're getting hung up on that.
The const-ness of the data is not germane to the problem.

It isn't that the data is dynamic, it's that swnode is being used at all
in a scenario where a parent device is managing it and there are child
devices that come & go over time (due in our case to SFP transceiver
hotplug). That scenario happens if platform_device_register_full() is
called, with the pdevinfo->properties argument set, to create a device
that itself may later generate child devices that come and go over time.
The platform_device_register_full() implementation notices that it's
getting a properties structure via its pdevinfo argument, triggering a
usage of device_create_managed_software_node(). It doesn't matter if
the incoming properties to platform_device_register_full() are static or
not; the behavior will be the same.

We are using platform_device_register_full() to instantiate the sfp
kernel module. A grep through the kernel sources shows many other cases
of this sort of pattern (call to platform_device_register_full() with
pdevinfo->properties set), so unless every single one of those cases
happens to not create / remove child devices, this problem is not
specific to our usage of the sfp kernel module. And there's certainly
no rule written anywhere that though shalt not use
platform_device_register_full() in cases where child devices may be
created / removed...



>
> > I had no idea of even the existence of swnode until I started tracking down
> > the kernel misbehavior.
>
> > Whether statically provided or calculated, this code path is happening.
> > Our case may be dynamic, but certainly any other managed swnode usage
> > which involves child devices is going to be a reference-counting problem
> > before this patch.
>
> I have a hardware that uses drivers/mfd/intel_quark_i2c_gpio.c. Do you think
> I can reproduce the issue with it?
>
> The hierarchy is that
>
> PCI device (associated ACPI node)
> I2C host controller (associated node in ACPI and managed swnode from the driver)
> I2C-client (it has it's own ACPI node)
> GPIO controller (associated node in ACPI and managed swnode from the driver)
> GPIO controller port (child swnode from the driver)
> ...users of GPIO pins of the given port...
>

I am unfamiliar with that driver or its surroundings, so I don't know if
it is a viable means to reproduce the problem. I can spend time
analyzing that code further if you wish. However it might be more
productive to try to succinctly, generically, spell out the steps that
lead to trouble:

1. Somebody creates a struct device instance.

2. Somebody calls device_create_managed_software_node() associating it
with that struct device instance. This causes two kobject references to
be counted against the swnode instance (one for the struct device and
one because the managed flag was set). refcount=2

3. Some time later, a child struct device is created and the properties
of the parent are shared with the child. This causes another kobject
reference to be counted against the swnode instance. refcount=3

4. Some time later, that child struct device instance goes away (perhaps
due to a hotplug removal). During the tear-down,
software_node_notify_remove() notices that the managed field of the
swnode instance is set, so TWO kobject references are removed instead of
the single reference created by the previous step. refcount=1

5. Repeat step 3. refcount=2

6. Repeat step 4. refcount=0

7. Now the kobject reference count inside the swnode instance has
dropped to zero and so the swnode instance is released. Notice that the
parent device is still holding a physical reference, now pointing into
freed memory. Chaos likely follows. If lucky, a NULL pointer kernel
oops is the result.

I confirmed this exact sequence of behavior in the context of the sfp
kernel module with appropriate printk debug trace added. While that's
going to be hard for you to reproduce without sfp hardware, anything
else which performs those steps should produce similar results.

The patch fixes this by causing step 4 to only do the second reference
decrement if the device being torn down is the same one that established
the managing relationship in the first place. Then everything stays
balanced.

Notice that this has nothing to do with whether or not the properties
are const, nor does it have anything to do with FPGA management. I
apologize if I caused confusion leading to that. It's purely the
combination of a managed swnode instance used for a device that will
create / remove child devices over time (due perhaps to hotplug
activity).

And because platform_device_register_full() will implicitly employ a
managed swnode instance any time it is passed properties in its pdevinfo
argument, then by implication anyone calling
platform_device_register_full() in that manner - which happens in
multiple other places in the kernel - is at risk for the same issue.

-Mike