Re: [PATCH v3 2/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump

From: Dou Liyang
Date: Tue Feb 13 2018 - 22:22:25 EST


Hi Eric,

At 02/14/2018 01:40 AM, Eric W. Biederman wrote:
Dou Liyang <douly.fnst@xxxxxxxxxxxxxx> writes:

Hi Baoquan,

At 02/12/2018 11:08 AM, Eric W. Biederman wrote:
Baoquan He <bhe@xxxxxxxxxx> writes:

This is a regression fix.

Before, to fix erratum AVR31, commit 522e66464467 ("x86/apic: Disable
I/O APIC before shutdown of the local APIC") moved lapic_shutdown()
calling after disable_IO_APIC(). This introdued a regression. The
root cause is that disable_IO_APIC() not only clears IO_APIC, also
restore boot irq mode by setting LAPIC/APIC/IMCR, lapic_shutdown()
after disable_IO_APIC() will disable LAPIC and ruin the possible
virtual wire mode setting which the code has been trying to do all
along.

The consequence is, in KVM guest kernel always prints warning as below
during kexec/kdump kernel boots up. That happened in setup_local_APIC()
since 'do { xxx } while (queued && max_loops > 0)' loop does not function

I am not sure another thing here

AFAIK, according to the order of SMP machine shutdown, other CPUs will
be stopped firstly, then the last CPU disable its local apic.

--machine_shutdown
|----......
|----stop_other_cpus()
|----local_shutdown()

So, the pending interrupts exist only in BSP and only be ACKed by
BSP. is it right?(I validated this by print the value of APIC_IRR/ISR of
all CPUs, found only BSP had the non-zero value).

We don't know. In the kexec on panic case we try and shutdown the other
cpus but we have a timeout because we might fail.

Further you have to be careful with the concept of boot cpu. In the
normal kexec case shutdown on the boot cpu and leave it running. In the
kexec on panic case we shutdown on an arbitrary cpu.

If it is right, We will do not need check the pending interrupt for each
cpus.

It is also cheap if there are no pending interrupts as there is nothing
to do.


Yes, It's cheap.

But, the local APICs in APs were disabled at that time, not sure if
writing to the end-of-interrupt (EOI) register could cause the local
APIC to clear the ISR successfully.

If the ISR was not cleared, doing that check is useless.

I can't produce any cases that the lapics in APs have pending
interrupts. do you have some suggestions?

BTW, the pending interrupt check code is mixed with the local
APIC setup code, that looks messy. How about extracting the code which
related to the crash interrupt check, put it into a new function and
only invoked it when the CPU is BSP?

Moving it into it's own function makes sense. Let's not taint the
concept with ``crash''. We don't know that the only way this will
ever happen is from kexec on panic. We only know it was easy to
trigger the condition from kexec on panic.


Yes, I see.

There are a lot of cases I can think of that interrupts might fire while
interrupts are disabled because the kernel is booting. A normal kexec
is also possible.

Throw in MSI interrupts and transitioning from one state to another in
non-legacy apic mode and we might be more likely to get some irqs in
a pending state.

Your patch makes me nervous as it is not just code motion but a much
more substantial change.


As much as I agree that we need to fix the regression in the apic
shutdown code that is causing problems. What we really need to do
is to completely remove the apic shutdown code from the kexec on panic
code path. Over the long term that will provide us with a much more

Wow, indeed, and it is related to many hypervisors on x86, For me, still
need more investigations and tests.

Thanks,
dou