Re: [PATCH] driver core: Fix uevent_show() vs driver detach race

From: Dan Williams
Date: Sat Jul 13 2024 - 13:22:53 EST


Tetsuo Handa wrote:
> On 2024/07/13 8:49, Dan Williams wrote:
> >>> + /* Synchronize with dev_uevent() */
> >>> + synchronize_rcu();
> >>> +
> >>
> >> this synchronize_rcu(), in order to make sure that
> >> READ_ONCE(dev->driver) in dev_uevent() observes NULL?
> >
> > No, this synchronize_rcu() is to make sure that if dev_uevent() wins the
> > race and observes that dev->driver is not NULL that it is still safe to
> > dereference that result because the 'struct device_driver' object is
> > still live.
>
> I can't catch what the pair of rcu_read_lock()/rcu_read_unlock() in dev_uevent()
> and synchronize_rcu() in module_remove_driver() is for.

It is to extend the lifetime of @driver if dev_uevent() observes
non-NULL @dev->driver.

> I think that the below race is possible.
> Please explain how "/* Synchronize with module_remove_driver() */" works.

It is for this race:

Thread1: Thread2:
dev_uevent(...) delete_module()
driver = dev->driver; mod->exit()
if (driver) driver_unregister()
driver_detach() // <-- @dev->driver marked NULL
module_remove_driver()
free_module() // <-- @driver object destroyed
add_uevent_var(env, "DRIVER=%s", driver->name); // <-- use after free of @driver

If driver_detach() happens before Thread1 reads dev->driver then there
is no use after free risk.

The previous attempt to fix this held the device_lock() over
dev_uevent() which prevents driver_detach() from even starting, but that
causes lockdep issues and is even more heavy-handed than the
synchronize_rcu() delay. RCU makes sure that @driver stays alive between
reading @dev->driver and reading @driver->name.