Re: [PATCH] drivers: core: Make dev->driver usage safe in dev_uevent()

From: Eugeniu Rosca
Date: Tue Apr 30 2024 - 09:18:24 EST


Hello Greg,

On Tue, Apr 30, 2024 at 10:27:19AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 30, 2024 at 10:17:54AM +0200, Eugeniu Rosca wrote:
> > Hi Greg,
> >
> > On Tue, Apr 30, 2024 at 09:20:10AM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Apr 30, 2024 at 06:55:31AM +0200, Dirk Behme wrote:
> > > > Inspired by the function dev_driver_string() in the same file make sure
> > > > in case of uninitialization dev->driver is used safely in dev_uevent(),
> > > > as well.
> > >
> > > I think you are racing and just getting "lucky" with your change here,
> > > just like dev_driver_string() is doing there (that READ_ONCE() really
> > > isn't doing much to protect you...)
> >
> > I hope below details shed more details on the repro:
> > https:// gist.github.com/erosca/1e8a87fbcc9e5ad0fecd32ebcb6266c3
>
> Sometimes I only have access to email, nothing else, please include in
> the email the full details.

Will follow your preference in the future.

>
> > To improve the occurrence rate:
> > - a dummy ds90ux9xx-dummy driver was used
> > - a dummy i2c node was added to DTS
> > - a dummy pr_alert() was added to dev_uevent() @ drivers/base/core.c
> > - UBSAN + KASAN enabled in .config
>
> So this is entirely fake? No real device or driver ever causes this
> problem?

Of course not. This issue is impacting the end user by resetting the HW
target once in a couple of months. Our synthetic test-case tries to
emulate the end user's scenario, for quicker reproduction & validation
of potential/candidate solutions.

Dirk's synthetic scenario leads to the same logs as shared by the user.
Based on that evidence, we believe we found the root cause.

>
> Why would you add a pr_alert() call? What is that for?
>
> totally confused.

pr_alert() acts as a simple delay, accelerating the reproduction.

>
>
> >
> > > > This change is based on the observation of the following race condition:
> > > >
> > > > Thread #1:
> > > > ==========
> > > >
> > > > really_probe() {
> > > > ...
> > > > probe_failed:
> > > > ...
> > > > device_unbind_cleanup(dev) {
> > > > ...
> > > > dev->driver = NULL; // <= Failed probe sets dev->driver to NULL
> > > > ...
> > > > }
> > > > ...
> > > > }
> > > >
> > > > Thread #2:
> > > > ==========
> > > >
> > > > dev_uevent() {
> > >
> > > Wait, how can dev_uevent() be called if probe fails? Who is calling
> > > that?
> >
> > dev_uevent() is called by reading /sys/bus/i2c/devices/<dev>/uevent.
> > Not directly triggered by the probe failure.
> > Please, kindly check the above gist/notes.
>
> Again, put the info in the email so we can properly quote and read it,
> and it's present for the history involved (web pages disappear, email is
> for forever.)

Agreed & will follow in the future (did not want to clutter the e-mail)

>
> So you have userspace hammering on a uevent file? Why is it being
> called if userspace hasn't even been notified that the device has a
> driver bound to it yet? What causes this action?

We know that uevent subsystem is involved, based on the post-mortem logs.
Hence, reading sysfs was the easiest way to translate the real-life
use-case to a synthetic one.

> >
> > [--- cut ---]
> >
> > > > - if (dev->driver)
> > > > - add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> > > > + /* dev->driver can change to NULL underneath us because of unbinding
> > > > + * or failing probe(), so be careful about accessing it.
> > > > + */
> > > > + drv = READ_ONCE(dev->driver);
> > > > + if (drv)
> > > > + add_uevent_var(env, "DRIVER=%s", drv->name);
> > >
> > > Again, you are just reducing the window here. Maybe a bit, but not all
> > > that much overall as there is no real lock present.
> >
> > The main objective of the patch is to "cache" dev->driver, such
> > that it is not cleared asynchronously from a parallel thread.
> > A refined/minimal locking alternative (if feasible) is welcome.
>
> "cacheing" a stale pointer still results in a stale pointer :(

Agreed. So, likely minimal/least-intrusive locking will be necessary.

>
> thanks,
>
> greg k-h

BR, Eugeniu