data-race in dev_uevent / really_probe?

From: Dirk Behme
Date: Fri Apr 26 2024 - 11:53:24 EST


Hi,

debugging a NULL pointer crash on a quite old embedded system kernel (4.14.x) we might have found the root cause for

https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a
https://groups.google.com/g/syzkaller-upstream-moderation/c/xTpwi0C6eSY/m/FqJAQtinAQAJ

Looking at the recent kernel, it looks like the relevant code hasn't changed that much since then. So even in recent kernel code it looks like there is a synchronization issue between dev_uevent() and really_probe():

Thread #1:
========

really_probe() {
...
probe_failed:
...
device_unbind_cleanup(dev) {
...
dev->driver = NULL; // <= Failed probe sets dev->driver to NULL
...
}
..
}

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/dd.c#n552


Thread #2:
========

dev_uevent() {
..
if (dev->driver)
// If dev->driver is NULLed from really_probe() from here on,
// after above check, the system crashes
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
..
}

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/core.c#n2670

The setup is a device driver probe that fails. In our case the probe from an I2C driver. While that failing probe does issue some dev_info() and dev_err() output. What seems to trigger in our case systemd-journal (as given in the groups.google.com link above) which calls via the given call stack dev_uevent().

In the end, dev_uevent() has validated dev->driver successfully. But if, depending on timing, exactly after this the failing (really-)probe() NULLs dev->driver, the system crashes due to using dev->driver being NULL then.

Does that make sense? Or have we missed anything?

Best regards

Dirk