Re: [PATCH v2] driver core: Don't let a device probe until it's ready

From: Danilo Krummrich

Date: Tue Mar 31 2026 - 10:44:20 EST


On Mon Mar 30, 2026 at 4:28 PM CEST, Douglas Anderson wrote:
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 09b98f02f559..4caa3fd1ecdb 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3688,6 +3688,21 @@ int device_add(struct device *dev)
> fw_devlink_link_device(dev);
> }
>
> + /*
> + * The moment the device was linked into the bus's "klist_devices" in
> + * bus_add_device() then it's possible that probe could have been
> + * attempted in a different thread via userspace loading a driver
> + * matching the device. "ready_to_probe" being false would have blocked
> + * those attempts. Now that all of the above initialization has
> + * happened, unblock probe. If probe happens through another thread
> + * after this point but before bus_probe_device() runs then it's fine.
> + * bus_probe_device() -> device_initial_probe() -> __device_attach()
> + * will notice (under device_lock) that the device is already bound.
> + */
> + device_lock(dev);
> + dev->ready_to_probe = true;
> + device_unlock(dev);
> +
> bus_probe_device(dev);
>
> /*
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 37c7e54e0e4c..a1762254828f 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
> if (dev->driver)
> return -EBUSY;
>
> + /*
> + * In device_add(), the "struct device" gets linked into the subsystem's
> + * list of devices and broadcast to userspace (via uevent) before we're
> + * quite ready to probe. Those open pathways to driver probe before
> + * we've finished enough of device_add() to reliably support probe.
> + * Detect this and tell other pathways to try again later. device_add()
> + * itself will also try to probe immediately after setting
> + * "ready_to_probe".
> + */
> + if (!dev->ready_to_probe)
> + return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");

Are we sure this dev->ready_to_probe dance does not introduce a new subtle bug
considering that ready_to_probe is within a bitfield of struct device?

I.e. are we sure there are no potential concurrent modifications of other fields
in this bitfield that are not protected with the device lock?

For instance, in __driver_attach() we set dev->can_match if
driver_match_device() returns -EPROBE_DEFER without the device lock held.

This is exactly the case you want to protect against, i.e. device_add() racing
with __driver_attach().

So, there is a chance that the dev->ready_to_probe change gets interleaved with
a dev->can_match change.

I think all this goes away if we stop using bitfields for synchronization; we
should convert some of those to flags that we can modify with set_bit() and
friends instead.

> +
> dev->can_match = true;
> dev_dbg(dev, "bus: '%s': %s: matched device with driver %s\n",
> drv->bus->name, __func__, drv->name);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e65d564f01cd..e2f83384b627 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -553,6 +553,8 @@ struct device_physical_location {
> * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
> * @dma_iommu: Device is using default IOMMU implementation for DMA and
> * doesn't rely on dma_ops structure.
> + * @ready_to_probe: If set to %true then device_add() has finished enough
> + * initialization that probe could be called.
> *
> * At the lowest level, every device in a Linux system is represented by an
> * instance of struct device. The device structure contains the information
> @@ -675,6 +677,7 @@ struct device {
> #ifdef CONFIG_IOMMU_DMA
> bool dma_iommu:1;
> #endif
> + bool ready_to_probe:1;
> };