Re: smp_call_function_single lockups

From: Linus Torvalds
Date: Thu Feb 19 2015 - 17:45:59 EST


On Thu, Feb 19, 2015 at 1:59 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Is this worth looking at? Or is it something spurious? I might have
> gotten the vectors wrong, and maybe the warning is not because the ISR
> bit isn't set, but because I test the wrong bit.

I edited the patch to do ratelimiting (one per 10s max) rather than
"once". And tested it some more. It seems to work correctly. The irq
case during 8042 probing is not repeatable, and I suspect it happens
because the interrupt source goes away (some probe-time thing that
first triggers an interrupt, but then clears it itself), so it doesn't
happen every boot, and I've gotten it with slightly different
backtraces.

But it's the only warning that happens for me, so I think my code is
right (at least for the cases that trigger on this machine). It's
definitely not a "every interrupt causes the warning because the code
was buggy, and the WARN_ONCE() just printed the first one".

It would be interesting to hear if others see spurious APIC EOI cases
too. In particular, the people seeing the IPI lockup. Because a lot of
the lockups we've seen have *looked* like the IPI interrupt just never
happened, and so we're waiting forever for the target CPU to react to
it. And just maybe the spurious EOI could cause the wrong bit to be
cleared in the ISR, and then the interrupt never shows up. Something
like that would certainly explain why it only happens on some machines
and under certain timing circumstances.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/