Re: [PATCH v4 1/9] driver core: Don't let a device probe until it's ready

From: Doug Anderson

Date: Mon Apr 06 2026 - 15:07:23 EST


Hi,

On Mon, Apr 6, 2026 at 11:11 AM Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
>
> On Mon Apr 6, 2026 at 7:06 PM CEST, Marc Zyngier wrote:
> > On Mon, 06 Apr 2026 17:43:22 +0100,
> > "Danilo Krummrich" <dakr@xxxxxxxxxx> wrote:
> >>
> >> On Mon Apr 6, 2026 at 6:34 PM CEST, Marc Zyngier wrote:
> >> > On Mon, 06 Apr 2026 15:41:08 +0100,
> >> > Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Sun, Apr 5, 2026 at 11:32 PM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >> >> >
> >> >> > > + * blocked those attempts. Now that all of the above initialization has
> >> >> > > + * happened, unblock probe. If probe happens through another thread
> >> >> > > + * after this point but before bus_probe_device() runs then it's fine.
> >> >> > > + * bus_probe_device() -> device_initial_probe() -> __device_attach()
> >> >> > > + * will notice (under device_lock) that the device is already bound.
> >> >> > > + */
> >> >> > > + dev_set_ready_to_probe(dev);
> >> >> >
> >> >> > I think this lacks some ordering properties that we should be allowed
> >> >> > to rely on. In this case, the 'ready_to_probe' flag being set should
> >> >> > that all of the data structures are observable by another CPU.
> >> >> >
> >> >> > Unfortunately, this doesn't seem to be the case, see below.
> >> >>
> >> >> I agree. I think Danilo was proposing fixing this by just doing:
> >> >>
> >> >> device_lock(dev);
> >> >> dev_set_ready_to_probe(dev);
> >> >> device_unlock(dev);
> >> >>
> >> >> While that's a bit of an overkill, it also works I think. Do folks
> >> >> have a preference for what they'd like to see in v5?
> >> >
> >> > It would work, but I find the construct rather obscure, and it implies
> >> > that there is a similar lock taken on the read path. Looking at the
> >> > code for a couple of minutes doesn't lead to an immediate clue that
> >> > such lock is indeed taken on all read paths.
> >>
> >> Why do you think this is obscure?
> >
> > Because you're not using the lock to protect any data. You're using
> > the lock for its release effect. Yes, it works. But the combination of
> > atomics *and* locking is just odd. You normally pick one model or the
> > other, not a combination of both.
>
> Yeah, the choice of bitops was purely because previously (in v2) this was a C
> bitfield member in struct device protected with the device lock. But, not all of
> the bitfield members were protected by the same lock or protected by a lock at
> all, which would have made this racy with the other bitfield members. I.e. the
> choice of bitops was independent; see also [2] for context.
>
> [2] https://lore.kernel.org/driver-core/DHH1PD0ASG8H.1K3KG9L658DYN@xxxxxxxxxx/

I've changed the snippet in the commit description to now justify the
use of bitops like this:

Instead of adding another flag to the bitfields already in "struct
device", instead add a new "flags" field and use that. This allows us
to freely change the bit from different thread without worrying about
corrupting nearby bits (and means threads changing other bit won't
corrupt us).


> >> As I mentioned in [1], the whole purpose of
> >> dev_set_ready_to_probe() is to protect against a concurrent probe() attempt of
> >> driver_attach() in __driver_probe_device(), while __driver_probe_device() is
> >> protected by the device lock is by design.
> >>
> >> [1] https://lore.kernel.org/driver-core/DHM5TCBT6GDE.EFG3IPRP99G7@xxxxxxxxxx/
> >
> > I don't have much skin in this game, and you seem to have strong
> > opinions about how these things are supposed to work. So whatever
> > floats your boat, as long as it is correct.
>
> Not overly, it's more about calling out the fact that probe() paths are
> serialized through the device lock by design, so it seems natural to protect
> dev_set_ready_to_probe() with the device lock.
>
> The fact that dev_set_ready_to_probe() uses a bitop under the hood is an
> implementation detail, i.e. it could also be an independent boolean.
>
> That said, as I caught the issue in [3], I also mentioned the option of an
> explicit memory barrier in device_add() and __driver_probe_device(). I.e. I'm
> not entirely against it, but I think the device lock is a bit cleaner.
>
> [3] https://lore.kernel.org/driver-core/DHLITCTY913U.J59JSQOVL0NH@xxxxxxxxxx/

I've got the series all prepped and it sounds as if the alignment is
on using device_lock(). I'll give it a few more hours in case there
are additional responses, then send a v5. ;-)

-Doug