Re: [PATCH v2] driver core: Don't let a device probe until it's ready
From: Alan Stern
Date: Mon Mar 30 2026 - 10:58:31 EST
On Mon, Mar 30, 2026 at 07:28:41AM -0700, Douglas Anderson wrote:
> The moment we link a "struct device" into the list of devices for the
> bus, it's possible probe can happen. This is because another thread
> can load the driver at any time and that can cause the device to
> probe. This has been seen in practice with a stack crawl that looks
> like this [1]:
>
> really_probe()
> __driver_probe_device()
> driver_probe_device()
> __driver_attach()
> bus_for_each_dev()
> driver_attach()
> bus_add_driver()
> driver_register()
> __platform_driver_register()
> init_module() [some module]
> do_one_initcall()
> do_init_module()
> load_module()
> __arm64_sys_finit_module()
> invoke_syscall()
>
> As a result of the above, it was seen that device_links_driver_bound()
> could be called for the device before "dev->fwnode->dev" was
> assigned. This prevented __fw_devlink_pickup_dangling_consumers() from
> being called which meant that other devices waiting on our driver's
> sub-nodes were stuck deferring forever.
>
> It's believed that this problem is showing up suddenly for two
> reasons:
> 1. Android has recently (last ~1 year) implemented an optimization to
> the order it loads modules [2]. When devices opt-in to this faster
> loading, modules are loaded one-after-the-other very quickly. This
> is unlike how other distributions do it. The reproduction of this
> problem has only been seen on devices that opt-in to Android's
> "parallel module loading".
> 2. Android devices typically opt-in to fw_devlink, and the most
> noticeable issue is the NULL "dev->fwnode->dev" in
> device_links_driver_bound(). fw_devlink is somewhat new code and
> also not in use by all Linux devices.
>
> Even though the specific symptom where "dev->fwnode->dev" wasn't
> assigned could be fixed by moving that assignment higher in
> device_add(), other parts of device_add() (like the call to
> device_pm_add()) are also important to run before probe. Only moving
> the "dev->fwnode->dev" assignment would likely fix the current
> symptoms but lead to difficult-to-debug problems in the future.
>
> Fix the problem by preventing probe until device_add() has run far
> enough that the device is ready to probe. If somehow we end up trying
> to probe before we're allowed, __driver_probe_device() will return
> -EPROBE_DEFER which will make certain the device is noticed.
>
> In the race condition that was seen with Android's faster module
> loading, we will temporarily add the device to the deferred list and
> then take it off immediately when device_add() probes the device.
>
> [1] Captured on a machine running a downstream 6.6 kernel
> [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing")
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
Reviewed-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>