Re: [GIT PULL] Driver core changes for 7.0-rc1
From: Gary Guo
Date: Sun Mar 01 2026 - 08:02:00 EST
On Sun Mar 1, 2026 at 7:44 AM GMT, Linus Torvalds wrote:
> On Wed, 11 Feb 2026 at 15:04, Danilo Krummrich <dakr@xxxxxxxxxx> wrote:
>>
>> Driver core changes for 7.0-rc1
>>
>> - Bus:
>> - Ensure bus->match() is consistently called with the device lock held
>
> So I'm coming back to this, because it turns out this sounds like a
> horrible mistake in the end.
>
> You document it as being about consistent locking, but it appears this
> change is what made the "firewire oops at driver attach" turn an oops
> into just a silently dead machine.
>
> In other words, it makes fragile drivers go from "you get an oops" to
> something much worse. The oops becomes unrecoverable - with typically
> a black screen at boot - because the probe is holding a lock that then
> makes everything else come to a grinding halt when the driver fails.
>
> And yes, this obviously only happens for buggy driver and doesn't
> matter for _correct_ code, but about half of the kernel code is
> drivers, and that half of the kernel code is also the typically the
> most badly tested and often questionably implemented half.
>
> No, not all drivers a bad, but there are a lot of drivers, and some of
> them have problems.
>
> So if a driver problem causes problems for the whole machine, the
> driver core design is bad.
>
> I really think this should be re-thought. Perhaps just reverted
> outright. Instead of saying
>
> "This inconsistency means that
> bus match() callbacks are not guaranteed to be called with the lock
> held"
>
> as if it's automatically a bad thing, just don't depend on the device
> match having to be called with a lock held if that lock has this
> problem.
Note that taking lock on match() fixes a real bug where data race can lead to
use-after-free https://bugzilla.kernel.org/show_bug.cgi?id=220789. It is
mentioned in the patch
https://lore.kernel.org/lkml/20260113162843.12712-1-hanguidong02@xxxxxxxxx/.
>
> It's not clear why anybody should *care* about the lock at driver
> attach time, when nothing else can access the device that hasn't been
> brought up yet.
We have always been taking the device lock when probing. This is needed as
obviously you don't want to have two drivers attaching to the same device at the
same time. When probing oops, the device lock is never going to be unlocked
again.
However, before matching starts to take the lock, we're "fine" in a sense that,
everything else keeps working as unless a device is matched and would actually
require probing, the device lock is not touched.
Perhaps what we should do is to defend against drivers oopsing inside probe and
have a mechanism so that device locks are unlocked even when probe oops. Another
option is to have `driver_override` protected by a different lock so match()
takes that lock instead of the device lock.
Best,
Gary
>
> Put another way: the downsides seem worse than the upsides.
> "Consistency" is not an upside if it causes problems.
>
> Linus