Re: [PATCH] x86, Fix do_IRQ interrupt warning for cpu hotplug retriggeredirqs

From: Prarit Bhargava
Date: Tue Dec 24 2013 - 08:12:44 EST




On 12/23/2013 11:41 PM, rui wang wrote:
> On 12/23/13, Prarit Bhargava <prarit@xxxxxxxxxx> wrote:
> > I think I have root caused this to the IRR being set in the down'd cpu. It
> > is
> > admittedly a rare occurrence in the kernel. I usually have to run about
> > 1000 up
> > and down's before hitting it, however, on my current test system it seems to
> > hit
> > much more frequently, almost 1 in 64 times.
> >
>>
>
> If that's the case, then it means stop_machine() doesn't manage to
> clear the IRR and ISR bits. But why not? Since this cpu is down it's
> not supposed to handle any further interrupts.

ISR bits can be cleared by software (via EOI). IRR bits CANNOT be cleared by
software. Once IRR bits are set the only thing that can clear them is the
processor itself.

IMHO we're supposed to
> send EOIs repeatedly until all the APIC_IRR and APIC_ISR bits are
> cleared. If an IRR bit is set, it means that there's (maybe another
> vector) an APIC_ISR bit set with the highest interrupt priority

I don't think that is quite right (admittedly I may be wrong on this and I'll
have to dig out an Intel spec to double check). As I mentioned above the ISR
can be cleared but the IRR cannot be cleared.

My understanding (also other's understanding on clearing of the IRR) is that
only the processor can clear the IRR. See arch/x86/kernel/apic/apic.c line 1352
and the code below it. Because of kexec boot, we "drain" the IRR by waiting. I
remember talking to the people who wrote that code about the IRR and the
inability to affect a write to clear requested and pending IRQs.

> Sending an EOI clears the highest priority APIC_ISR bit, so the LAPIC
> will then clear the next highest priority IRR bit and set the
> corresponding ISR bit...

But the issue here is that the IRR has been set, but the processor is yet to
execute the irq's handler. That's what makes this error so difficult to hit.
It is a race with the hardware to catch the processor in exactly this state.

We can repeat the process. It's like handling
> interrups in polled mode. That's the right thing to do IMHO.

But the IRR register bits are still set and once the processor is out of
interrupt context (ie, in the cpu_die() code) the processor will attempt to
access the vector for that IRR bit and we'll still get a do_IRQ() warning. That
is different than handling the ISR.

[Aside: Sorry Rui, but it feels like I repeated myself several times about the
IRR :). I mean no disrespect towards you by that :) as I'm just answering each
of your points and pointing out that the IRR is the problem here.]

>
> The other unanswered question is why isn't cpu_online_mask() protected
> by a spin lock ? Being atomic isn't enough.

Perhaps it is thought that the cpu_online_mask() can only be changed by the cpu
hotplug code and that is done via stop_machine() and protected by
cpu_maps_update_begin() and cpu_maps_update_done() in cpu_down()? So an
additional lock is not necessary?

>
> Are these all well-known issues? Are there well-known answers already?

I thought the bug was well-known because of the number of do_IRQ warning reports
I stumbled across while searching for a solution. But as the code says in
fixup_irqs()

/*
* We can remove mdelay() and then send spuriuous interrupts to
* new cpu targets for all the irqs that were handled previously by
* this cpu. While it works, I have seen spurious interrupt messages
* (nothing wrong but still...).

so I don't think anyone has realized what the bug actually was.

So to recap ;), here is what I think the problem is:

1. CPU 5 is to go down.
2. CPU 5 IRQ 12 IRR is set to request service, but ISR is not set.
3. cpu_disable executes via stop_machine()
4. IRQs are migrated off of CPU 5, and set to -1.
4. stop_machine() finishes cpu_disable()
5. cpu_die() code executes in normal context.
6. CPU 5 attempts to handle IRQ 12.
7. do_IRQ warning displays warning about CPU 5 IRQ 12.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/