Re: [PATCH v3 0/4] KVM: x86: hyper-v: Conditionally allow SynIC with APICv/AVIC

From: Maxim Levitsky
Date: Mon Jun 14 2021 - 14:21:53 EST


On Mon, 2021-06-14 at 15:08 +0200, Paolo Bonzini wrote:
> On 14/06/21 11:51, Maxim Levitsky wrote:
> > On Mon, 2021-06-14 at 09:40 +0200, Vitaly Kuznetsov wrote:
> > > Maxim Levitsky <mlevitsk@xxxxxxxxxx> writes:
> > >
> > > > On Wed, 2021-06-09 at 17:09 +0200, Vitaly Kuznetsov wrote:
> > > > > Changes since v2:
> > > > > - First two patches got merged, rebase.
> > > > > - Use 'enable_apicv = avic = ...' in PATCH1 [Paolo]
> > > > > - Collect R-b tags for PATCH2 [Sean, Max]
> > > > > - Use hv_apicv_update_work() to get out of SRCU lock [Max]
> > > > > - "KVM: x86: Check for pending interrupts when APICv is getting disabled"
> > > > > added.
> > > > >
> > > > > Original description:
> > > > >
> > > > > APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
> > > > > SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
> > > > > however, possible to track whether the feature was actually used by the
> > > > > guest and only inhibit APICv/AVIC when needed.
> > > > >
> > > > > The series can be tested with the followin hack:
> > > > >
> > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > index 9a48f138832d..65a9974f80d9 100644
> > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > @@ -147,6 +147,13 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
> > > > > vcpu->arch.ia32_misc_enable_msr &
> > > > > MSR_IA32_MISC_ENABLE_MWAIT);
> > > > > }
> > > > > +
> > > > > + /* Dirty hack: force HV_DEPRECATING_AEOI_RECOMMENDED. Not to be merged! */
> > > > > + best = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_ENLIGHTMENT_INFO, 0);
> > > > > + if (best) {
> > > > > + best->eax &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> > > > > + best->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
> > > > > + }
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
> > > > >
> > > > > Vitaly Kuznetsov (4):
> > > > > KVM: x86: Use common 'enable_apicv' variable for both APICv and AVIC
> > > > > KVM: x86: Drop vendor specific functions for APICv/AVIC enablement
> > > > > KVM: x86: Check for pending interrupts when APICv is getting disabled
> > > > > KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in
> > > > > use
> > > > >
> > > > > arch/x86/include/asm/kvm_host.h | 9 +++++-
> > > > > arch/x86/kvm/hyperv.c | 51 +++++++++++++++++++++++++++++----
> > > > > arch/x86/kvm/svm/avic.c | 14 ++++-----
> > > > > arch/x86/kvm/svm/svm.c | 22 ++++++++------
> > > > > arch/x86/kvm/svm/svm.h | 2 --
> > > > > arch/x86/kvm/vmx/capabilities.h | 1 -
> > > > > arch/x86/kvm/vmx/vmx.c | 2 --
> > > > > arch/x86/kvm/x86.c | 18 ++++++++++--
> > > > > 8 files changed, 86 insertions(+), 33 deletions(-)
> > > > >
> > > >
> > > > Hi!
> > > >
> > > > I hate to say it, but at least one of my VMs doesn't boot amymore
> > > > with avic=1, after the recent updates. I'll bisect this soon,
> > > > but this is likely related to this series.
> > > >
> > > > I will also review this series very soon.
> > > >
> > > > When the VM fails, it hangs on the OVMF screen and I see this
> > > > in qemu logs:
> > > >
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > > KVM: injection failed, MSI lost (Operation not permitted)
> > > >
> > >
> > > -EPERM?? Interesting... strace(1) may come handy...
> >
> > Hi Vitaly!
> >
> > I spent all yesterday debugging this and I found out what is going on:
> > (spoiler alert: hacks are bad)
> >
> > The call to kvm_request_apicv_update was moved to a delayed work which is fine at first glance
> > but turns out that we both don't notice that kvm doesn't allow to update the guest
> > memory map from non vcpu thread which is what kvm_request_apicv_update does
> > on AVIC.
> >
> > The memslot update is to switch between regular r/w mapped dummy page
> > which is not really used but doesn't hurt to be there, and between paging entry with
> > reserved bits, used for MMIO, which AVIC sadly needs because it is written in the
> > spec that AVIC's MMIO despite being redirected to the avic_vapic_bar, still needs a valid
> > R/W mapping in the NPT, whose physical address is ignored.
> >
> > So, in avic_update_access_page we have this nice hack:
> >
> > if ((kvm->arch.apic_access_page_done == activate) ||
> > (kvm->mm != current->mm))
> > goto out;
> >
> > So instead of crashing this function just does nothing.
> > So AVIC MMIO is still mapped R/W to a dummy page, but the AVIC itself
> > is disabled on all vCPUs by kvm_request_apicv_update (with
> > KVM_REQ_APICV_UPDATE request)
> >
> > So now all guest APIC writes just disappear to that dummy
> > page, and we have a guest that seems to run but can't really
> > continue.
> >
> > The -EPERM in the error message I reported, is just -1, returned by
> > KVM_SIGNAL_MSI which is likely result of gross missmatch between
> > state of the KVM's APIC registers and that dummy page which contains
> > whatever the guest wrote there and what the guest thinks
> > the APIC registers are.
> >
> > I am curently thinking on how to do the whole thing with
> > KVM's requests, I'll try to prepare a patch today.
>
> I'll drop the last two patches in the series.

Actually only the patch 4 is problemetic, and patch 3 IMHO does fix an issue.
Best regards,
Maxim Levitsky

>
> Paolo
>