Re: [RFC PATCH] driver core: Don't link the device to the bus until we're ready to probe
From: Doug Anderson
Date: Mon Mar 23 2026 - 18:25:35 EST
Hi,
On Mon, Mar 23, 2026 at 10:47 AM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 23, 2026 at 10:07:01AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Sat, Mar 21, 2026 at 8:54 AM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > > > I'd also note that the only actual symptom we're seeing is with
> > > > > > fw_devlink misbehaving (because dev->fwnode->dev wasn't set early
> > > > > > enough). fw_devlink is a "new" (ish) feature, is officially optional,
> > > > > > and isn't used on all hardware.
> > > > >
> > > > > That's true too, can we set that earlier?
> > > >
> > > > Yes, I can post a patch that _just_ moves the set of dev->fwnode->dev
> > > > earlier, and that will probably fix my symptoms (I'll need to test).
> > > > This patch already moves it a bit earlier, but if we don't break the
> > > > linking out as a separate step it would need to move even higher up in
> > > > the function.
> > > >
> > > > Originally, I was going to just propose that, but then I realized that
> > > > some of the other code in device_add() probably also ought to run
> > > > before we let the driver probe, and hence I ended up with this patch.
> > >
> > > This sounds like a more generic problem. A bunch of things happen after
> > > bus_add_device() that should be completed before probing can start; the
> > > firmware node stuff is just one of them.
>
> You know, I wrote this but I'm not so sure that it's accurate. We've
> gone many years with no big changes to this code; most likely it doesn't
> need alterations now.
That's fair, but I'm really just worried that the Android parallel
module loading code, which is only ~1 year old (and needs to be opted
in for each device) is stressing things in a way that nobody else is.
In most distros, I think modules are loaded "on-demand". A device gets
added first, and then we figure out which module has the driver that's
needed and load the module. Nice and sequential. Android seems to have
a different approach. As far as I understand it just has a list of
modules to load and slams through loading all of them even as devices
are still being added.
I don't think that what Android is doing is technically "wrong", but
it's certainly odd compared to all other distros.
Unless we say that what Android is doing is wrong (and decide what to
do about it), it seems like we need to make sure it's robust.
> > > Splitting bus_add_device() in two sounds reasonable, although I would
> > > rename the old routine to bus_link_device, since all it does it add some
> > > groups and symlinks. The new routine can be called bus_add_device().
> >
> > LOL, so basically you want them named exactly opposite I did? That's
> > OK with me, though maybe:
> >
> > bus_prep_device()
> > bus_add_device()
>
> Nothing wrong with those names. But instead of making these fairly
> intrusive changes it might be better just to move the firmware stuff to
> a different place in the code (you mentioned this possibility in your
> first email). It would be a smaller change, that's for sure.
I'll do that if that's what everyone wants, but the more I think about
it the more worried I am that we'll end up with a hidden / harder to
debug problem where some driver gets unhappy when its probe is called
before dpm_sysfs_add(), device_pm_add(), device_create_file(),
device_create_sys_dev_entry(), BUS_NOTIFY_ADD_DEVICE, ...
> > > The real question is whether any of the other stuff that happens before
> > > bus_probe_device() needs to come after the device is added to the bus's
> > > list. The bus_notify() and kobject_uevent() calls are good examples; I
> > > don't know what their requirements are. Should they be moved down,
> > > between the new bus_add_device() and bus_probe_device()?
> >
> > Yes, this is my question too. The question is, what's worse:
> >
> > 1. Potentially the device getting probed before the calls to
> > "bus_notify(dev, BUS_NOTIFY_ADD_DEVICE)" and
> > "kobject_uevent(&dev->kobj, KOBJ_ADD)"
> >
> > 2. Calling "bus_notify(dev, BUS_NOTIFY_ADD_DEVICE)" and
> > "kobject_uevent(&dev->kobj, KOBJ_ADD)" without first adding to the
> > subsystem's list of devices.
> >
> > As it is, my patch says #2 is worse and thus allows for #1. ...but I
> > don't actually know the answer. The main reason I chose to allow for
> > #1 is that it makes the behavior less different than it was before my
> > patch, but that doesn't mean it's correct.
> >
> > Reading through a handful of "BUS_NOTIFY_ADD_DEVICE" calls, my guess
> > is that they expect to be called before probe... That would imply that
> > my patch made the wrong choice...
>
> I don't know about other subsystems, only USB. As far as I remember,
> USB doesn't really care about the order of probing vs. notifications.
> In general, since probing can be asynchronous with registration, the
> kernel always has to work with probing after bus_notify(). It's
> best to preserve that order unless there's a very good reason not to.
>
> uevents may be different, since they go to userspace. It could be
> weird for a user program to be told about a new device and then not be
> able to find it on the bus.
>
> > I think another option here might be to just add a new bitfield flag
> > to "struct device", like "ready_to_probe". Right before the call to
> > bus_probe_device(dev), we could do something like:
> >
> > device_lock(dev);
> > dev->ready_to_probe = true;
> > device_unlcok(dev);
> >
> > Then in __driver_attach() I can have:
> >
> > device_lock(dev);
> > ready_to_probe = dev->ready_to_probe;
> > device_unlock(dev);
> > if (!ready_to_probe)
> > return 0;
> >
> > If I do that, I don't think I'll even need to re-order anything, and I
> > think it's all safe. Basically, it just the device hidden from the
> > __driver_attach() logic until probe is ready.
> >
> > What do you think?
>
> There should not be any difference between probing caused by the device
> being added to the bus, vs. caused by a new driver being registered, vs.
> caused by anything else (such as sysfs). None of these should be
> allowed until all of them can be handled properly.
Right. ...and I think that's what my proposed "ready_to_probe" does.
It really does seem like quite a safe change. It _just_ prevents the
driver load path from initiating a probe too early.
> And linking the device into the bus's list of devices should be the
> event that makes probing possible.
Sure, but moving the linking into the bus's list of devices all the
way to the end is definitely a bigger change. If nothing else,
"bus_for_each_dev()" starts to be able to find the device once it's
linked into the list. If any of the ~50 drivers who register for
BUS_NOTIFY_ADD_DEVICE are relying on the device to show up in
"bus_for_each_dev()", it would be bad...
-Doug