Re: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt injection
From: Paolo Bonzini
Date: Wed Sep 28 2016 - 06:17:01 EST
On 28/09/2016 12:04, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:paolo.bonzini@xxxxxxxxx] On Behalf Of Paolo
>> Bonzini
>> Sent: Wednesday, September 28, 2016 5:20 AM
>> To: linux-kernel@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx
>> Cc: yang.zhang.wz@xxxxxxxxx; Wu, Feng <feng.wu@xxxxxxxxx>;
>> mst@xxxxxxxxxx; rkrcmar@xxxxxxxxxx
>> Subject: [PATCH 2/3] kvm: x86: do not use KVM_REQ_EVENT for APICv interrupt
>> injection
>>
>> Since bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU
>> is blocked", 2015-09-18) the posted interrupt descriptor is checked
>> unconditionally for PIR.ON. Therefore we don't need KVM_REQ_EVENT to
>> trigger the scan and, if NMIs or SMIs are not involved, we can avoid
>> the complicated event injection path.
>
> But the following code still remains in the KVM_REQ_EVENT checking part:
>
> if (kvm_lapic_enabled(vcpu)) {
> update_cr8_intercept(vcpu);
> kvm_lapic_sync_to_vapic(vcpu);
> }
>
> Does this matter?
Good question, but it doesn't matter for APICv because:
- update_cr8_intercept is disabled if APICv, see vmx.c:
if (enable_apicv)
kvm_x86_ops->update_cr8_intercept = NULL;
- kvm_lapic_sync_to_vapic's call to apic_sync_pv_eoi_to_guest is also
disabled if APICv:
if (!pv_eoi_enabled(vcpu) ||
apic->irr_pending ||
apic->highest_isr_cache == -1 ||
kvm_ioapic_handles_vector(apic, apic->highest_isr_cache))
return;
(highest_isr_cache is always -1 for APICv)
- The TPR/ISR/IRR shadow that kvm_lapic_sync_to_vapic writes is only
read by the paravirtualized TPR access code in the vAPIC ROM
(pc-bios/optionrom/kvmvapic.S in the QEMU tree). That code never runs if
you don't get TPR access vmexits, and indeed TPR access vmexits never
happen if KVM uses APICv (or even only the old-style TPR shadowing).
Paolo
> Thanks,
> Feng
>
>>
>> However, there is a race between vmx_deliver_posted_interrupt and
>> vcpu_enter_guest. Fix it by disabling interrupts before vcpu->mode is
>> set to IN_GUEST_MODE.
>>
>> Calling kvm_vcpu_kick if PIR.ON=1 is also useless, though it has been
>> there since APICv was introduced.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/lapic.c | 2 --
>> arch/x86/kvm/vmx.c | 8 +++++---
>> arch/x86/kvm/x86.c | 9 +++++++--
>> 3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 63a442aefc12..be8b7ad56dd1 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -356,8 +356,6 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32
>> *pir)
>> struct kvm_lapic *apic = vcpu->arch.apic;
>>
>> __kvm_apic_update_irr(pir, apic->regs);
>> -
>> - kvm_make_request(KVM_REQ_EVENT, vcpu);
>> }
>> EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index b33eee395b00..207b9aa32915 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4844,9 +4844,11 @@ static void vmx_deliver_posted_interrupt(struct
>> kvm_vcpu *vcpu, int vector)
>> if (pi_test_and_set_pir(vector, &vmx->pi_desc))
>> return;
>>
>> - r = pi_test_and_set_on(&vmx->pi_desc);
>> - kvm_make_request(KVM_REQ_EVENT, vcpu);
>> - if (r || !kvm_vcpu_trigger_posted_interrupt(vcpu))
>> + /* If a previous notification has sent the IPI, nothing to do. */
>> + if (pi_test_and_set_on(&vmx->pi_desc))
>> + return;
>> +
>> + if (!kvm_vcpu_trigger_posted_interrupt(vcpu))
>> kvm_vcpu_kick(vcpu);
>> }
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 3ee8a91a78c3..604cfbfc5bee 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6658,6 +6658,13 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> kvm_x86_ops->prepare_guest_switch(vcpu);
>> if (vcpu->fpu_active)
>> kvm_load_guest_fpu(vcpu);
>> +
>> + /*
>> + * Disable IRQs before setting IN_GUEST_MODE, so that
>> + * posted interrupts with vcpu->mode == IN_GUEST_MODE
>> + * always result in virtual interrupt delivery.
>> + */
>> + local_irq_disable();
>> vcpu->mode = IN_GUEST_MODE;
>>
>> srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>> @@ -6671,8 +6678,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> */
>> smp_mb__after_srcu_read_unlock();
>>
>> - local_irq_disable();
>> -
>> if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>> || need_resched() || signal_pending(current)) {
>> vcpu->mode = OUTSIDE_GUEST_MODE;
>> --
>> 1.8.3.1
>>
>