Re: [tip:x86/apic] x86: Use EOI register in io-apic on intelplatforms

From: Suresh Siddha
Date: Tue Dec 01 2009 - 19:57:35 EST


Maciej,

Sorry for delayed response. I was busy with other stuff and didn't get a
chance till now to get back on this. I just posted few patches which
came up as a result of our past discussions here in this thread.

Please see my responses inline for your earlier comments:

On Sun, 2009-11-08 at 11:06 -0800, Maciej W. Rozycki wrote:
> OK, I see what you mean, but that makes me wonder why you are going
> through such contortions. In the case of a CPU going offline I would
> expect it to be done more or less in such a way:
>
> 1. Write all-zeroes to its local APIC's LDR register and set its TPR to
> 0xef. This will take this APIC out from LoPri arbitration and thus
> from accepting any I/O APIC interrupts. Fixed delivery mode IPIs will
> still be accepted (if that's not needed then the TPR can be set to
> 0xff; any received IPIs will be lost).
>
> 2. Service any outstanding interrupts that have already been accepted by
> the local APIC (you may have to poll on the local IRR register with
> interrupts enabled for a short while).
>
> 3. Disable the local APIC via the SVR register, mask local interrupts in
> the processor's EFLAGS register and start the offline procedure. This
> is the point of no-return, further IPIs won't be accepted and the CPU
> has to be put through an INIT-IPI+StartUp-IPI cycle to get in control
> again.
>
> If IPI reception was not needed through stage #2 above, then the local
> APIC could have been disabled at #1 instead -- interrupts pending in
> the local APIC as recorded in the IRR or marked as in-progress in the
> ISR are guaranteed to be delivered to the CPU and EOIed (as
> appropriate) normally even in the disabled state of the local APIC.

But before these 3 steps you listed here, we need to migrate the irq to
the new destination. And that step will modify the IO-APIC RTE with the
new vector and new destination. And during this process, remoteIRR of
the IO-APIC RTE might be set and this inflight interrupt will get
registered at the original destination that is going offline.

So when we come to step 2 you listed above and service any outstanding
interrupts, EOI broadcast as part of that handling won't clear the
remoteIRR of the IO-APIC RTE, as the vector information in the io-apic
RTE got modified because of irq migration and is different from the
vector information in EOI broadcast message sent by the cpu. This will
result in stuck level interrupt.

This is one of the challenges Eric Biederman had in the past and he
tried things like polling from the process context and modifying the
IO-APIC RTE (with new destination and vector information) only when the
remoteIRR is not set etc. But Eric still saw some hangs and stuck
interrupt conditions with his experimental patches.

We took a route which needed minor changes to the existing code and fix
the local_irq_enable()/local_irq_disable() issue and stuck interrupt
issue in the cpu offline path by using the IO-APIC EOI register. Our
tests on Intel platforms having an EOI register for io-apic's and IBM's
(Gary) tests on io-apic's which don't have EOI register using AMD
platforms worked fine with our approach.

> > Do you agree?
>
> If the scenario I have outlined above cannot be made to work for some
> reason,

Perhaps we can make it work but it needs more changes and validation.
And atleast Eric's similar experiments in the past didn't yield good
results.

> then please do me and the others a favour and since with this
> change you are tying new functionality to code originally meant as a
> workaround for an obscure erratum only, do write a proper explanation and
> place it next to the original comment describing previous use of the code.
> With your change as it is, it is all but obvious what this piece of code
> is meant to do.

This patch was also in the series that I just sent.

> Your change is OK with me once accompanied with said comment, but please
> investigate my scenario first -- your approach looks like a horrible hack
> to me, sorry.

This being a fragile area and considering our experiences in the past, I
leveraged the existing code (of clearing remoteIRR manually on ioapic's
which don't have an EOI register) and luckily we had good success so
far.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/