Re: [GIT PULL] Driver core changes for 7.0-rc1
From: Danilo Krummrich
Date: Sun Mar 01 2026 - 08:04:33 EST
On Sun Mar 1, 2026 at 8:44 AM CET, Linus Torvalds wrote:
> So I'm coming back to this, because it turns out this sounds like a
> horrible mistake in the end.
I came to the same conclusion following the discussion around the firewire oops.
> You document it as being about consistent locking
It happens that quite a few busses rely on this, and there is a possible race
condition that can lead to UAF bugs in the context of driver_override.
I think it is rather unlikely to happen though, as it would require a user to
change a device's driver_override field through sysfs while the device is
matched with a driver.
In any case, this can easily be solved with a separate lock.
> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.
Yes, the problem is that when a device is already present in the system and a
driver is registered on the same bus, we iterate over all devices registered on
this bus to see if one of them matches. If we come across an already bound one
where the corresponding driver crashed while holding the device lock (e.g. in
probe()) we can't make any progress anymore.
Obviously, this is not an issue the other way around, i.e. when the driver is
present in the system first and the device is added subsequently.
> And yes, this obviously only happens for buggy driver and doesn't
> matter for _correct_ code, but about half of the kernel code is
> drivers, and that half of the kernel code is also the typically the
> most badly tested and often questionably implemented half.
I agree, it is a case that will happen regularly, and besides hurting developer
ergonomics, it potentially decreases chances of shutting things down cleanly and
obtaining logs in a production environment as well.
> I really think this should be re-thought. Perhaps just reverted
> outright.
Yes, I agree and in fact I already have a few local changes to move
driver_override to struct device, provide corresponding accessors for busses and
handle locking with a separate lock.
(Technically, the "move driver_override to struct device" part is orthogonal,
but doing it right away results in less (and much cleaner) changes.)
I do not consider those changes to be complicated and risky, but I'm not sure
you want to see those for one of the upcoming -rc releases (probably -rc4/5).
Independently, I can send a revert for -rc3.
Thanks,
Danilo