Re: [PATCH] KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC

From: Paolo Bonzini
Date: Tue Jul 02 2019 - 12:43:40 EST


On 02/07/19 11:25, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>
> Thomas reported that:
>
> | Background:
> |
> | In preparation of supporting IPI shorthands I changed the CPU offline
> | code to software disable the local APIC instead of just masking it.
> | That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
> | register.
> |
> | Failure:
> |
> | When the CPU comes back online the startup code triggers occasionally
> | the warning in apic_pending_intr_clear(). That complains that the IRRs
> | are not empty.
> |
> | The offending vector is the local APIC timer vector who's IRR bit is set
> | and stays set.
> |
> | It took me quite some time to reproduce the issue locally, but now I can
> | see what happens.
> |
> | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
> | (and hardware support) it behaves correctly.
> |
> | Here is the series of events:
> |
> | Guest CPU
> |
> | goes down
> |
> | native_cpu_disable()
> |
> | apic_soft_disable();
> |
> | play_dead()
> |
> | ....
> |
> | startup()
> |
> | if (apic_enabled())
> | apic_pending_intr_clear() <- Not taken
> |
> | enable APIC
> |
> | apic_pending_intr_clear() <- Triggers warning because IRR is stale
> |
> | When this happens then the deadline timer or the regular APIC timer -
> | happens with both, has fired shortly before the APIC is disabled, but the
> | interrupt was not serviced because the guest CPU was in an interrupt
> | disabled region at that point.
> |
> | The state of the timer vector ISR/IRR bits:
> |
> | ISR IRR
> | before apic_soft_disable() 0 1
> | after apic_soft_disable() 0 1
> |
> | On startup 0 1
> |
> | Now one would assume that the IRR is cleared after the INIT reset, but this
> | happens only on CPU0.
> |
> | Why?
> |
> | Because our CPU0 hotplug is just for testing to make sure nothing breaks
> | and goes through an NMI wakeup vehicle because INIT would send it through
> | the boots-trap code which is not really working if that CPU was not
> | physically unplugged.
> |
> | Now looking at a real world APIC the situation in that case is:
> |
> | ISR IRR
> | before apic_soft_disable() 0 1
> | after apic_soft_disable() 0 1
> |
> | On startup 0 0
> |
> | Why?
> |
> | Once the dying CPU reenables interrupts the pending interrupt gets
> | delivered as a spurious interupt and then the state is clear.
> |
> | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
> | emulation is still wrong, Even if the play_dead() code would not enable
> | interrupts then the pending IRR bit would turn into an ISR .. interrupt
> | when the APIC is reenabled on startup.
>
>
> From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
> * Pending interrupts in the IRR and ISR registers are held and require
> masking or handling by the CPU.
>
> In Thomas's testing, hardware cpu will not respect soft disable LAPIC
> when IRR has already been set or APICv posted-interrupt is in flight,
> so we can skip soft disable APIC checking when clearing IRR and set ISR,
> continue to respect soft disable APIC when attempting to set IRR.
>
> Reported-by: Rong Chen <rong.a.chen@xxxxxxxxx>
> Reported-by: Feng Tang <feng.tang@xxxxxxxxx>
> Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Tested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Rong Chen <rong.a.chen@xxxxxxxxx>
> Cc: Feng Tang <feng.tang@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx>
> ---
> arch/x86/kvm/lapic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 05d8934..f857a12 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2376,7 +2376,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
> struct kvm_lapic *apic = vcpu->arch.apic;
> u32 ppr;
>
> - if (!apic_enabled(apic))
> + if (!kvm_apic_hw_enabled(apic))
> return -1;
>
> __apic_update_ppr(apic, &ppr);
>

Queued, thanks.

Paolo