Re: [WARNING: A/V UNSCANNABLE][Merge tag 'media/v4.11-1' of git] ff58d005cd: BUG: unable to handle kernel NULL pointer dereference at 0000039c

From: Ingo Molnar
Date: Tue Feb 28 2017 - 05:28:53 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> In other words: what will happen is that distros start getting bootup problem
> reports six months or a year after we've done it, and *if* they figure out it's
> the irq enabling, they'll disable it, because they have no way to solve it
> either.
>
> And core developers will just maybe see the occasional "4.12 doesn't boot for
> me" reports, but by then developers will ahve moved on to 4.16 or something.

Yeah, you are right, there's over 2,100 request_irq() calls in the kernel and
perhaps only 1% of them gets tested on real hardware by the time a change gets
upstream :-/

So in theory we could require all *new* drivers handle this properly, as new
drivers tend to come through developers who can fix such bugs - which would at
least guarantee that with time the problem would obsolete itself.

But I cannot see an easy non-intrusive way to do that - we'd have to rename all
existing request_irq() calls:

- We could rename request_irq() to request_irq_legacy() - which does not do the
tests.

- The 'new' request_irq() function would do the tests unconditionally.

... and that's just too much churn - unless you think it's worth it, or if anyone
can think of a better method to phase in the new behavior without affecting old
users.

Another, less intrusive method would be to introduce a new request_irq_shared()
API call, mark request_irq() obsolete (without putting warnings into the build
though), and put a check into checkpatch that warns about request_irq() use.

The flip side would be that:

- request_irq() is such a nice and well-known name to waste

- plus request_irq_shared() is a misnomer, as this has nothing to do with sharing
IRQs, it's about getting IRQs in unexpected moments.

I'd rather do the renaming that is easy to automate and the pain is one time only.

Or revert the retrigger change and muddle through, although as per Thomas's
observations spurious interrupts are very common.

Thanks,

Ingo