Re: [PATCH] KVM: X86: fix tlb_flush_guest()
From: Maxim Levitsky
Date: Sat May 29 2021 - 18:12:57 EST
On Thu, 2021-05-27 at 16:13 +0000, Sean Christopherson wrote:
> +Maxim - A proper fix for this bug might fix your shadow paging + win10 boot
> issue, this also affects the KVM_REQ_HV_TLB_FLUSH used for HyperV PV
> flushing.
Still crashes with this patch sadly (tested now on my AMD laptop now).
This win10 boot bug without TDP paging is 100% reproducible,
although it crashes sometimes a bit differently, sometimes VM reboots right
at start of the boot, sometimes it just hangs forever,
sometimes the VM bluescreens (with various reasons).
This makes sense for random paging related corruption though.
I'll look at this patch more carefully soon.
I also have another bug which I put aside as well due to complexity
which involves running hyperv itself nested and and then migrating
the L1, which leads the hyperv VM bluescreen sooner or later,
(I don't remember anymore if that was both on intel and AMD or only intel,
but this didn’t involve any npt/ept disablement),
so I'll see if this patch helps with this bug as well.
Thanks,
Best regards,
Maxim Levitsky
>
> On Thu, May 27, 2021, Paolo Bonzini wrote:
> > On 27/05/21 04:39, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> > >
> > > For KVM_VCPU_FLUSH_TLB used in kvm_flush_tlb_multi(), the guest expects
> > > the hypervisor do the operation that equals to native_flush_tlb_global()
> > > or invpcid_flush_all() in the specified guest CPU.
> > >
> > > When TDP is enabled, there is no problem to just flush the hardware
> > > TLB of the specified guest CPU.
> > >
> > > But when using shadowpaging, the hypervisor should have to sync the
> > > shadow pagetable at first before flushing the hardware TLB so that
> > > it can truely emulate the operation of invpcid_flush_all() in guest.
> >
> > Can you explain why?
>
> KVM's unsync logic hinges on guest TLB flushes. For page permission modifications
> that require a TLB flush to take effect, e.g. making a writable page read-only,
> KVM waits until the guest explicitly does said flush to propagate the changes to
> the shadow page tables. E.g. failure to sync PTEs could result in a read-only 4k
> page being writable when the guest expects it to be read-only.
>
> > Also it is simpler to handle this in kvm_vcpu_flush_tlb_guest, using "if
> > (tdp_enabled). This provides also a single, good place to add a comment
> > with the explanation of what invalid entries KVM_REQ_RELOAD is presenting.
>
> Ya.
>
> KVM_REQ_MMU_RELOAD is overkill, nuking the shadow page tables will completely
> offset the performance gains of the paravirtualized flush.
>
> And making a request won't work without revamping the order of request handling
> in vcpu_enter_guest(), e.g. KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_SYNC are both
> serviced before KVM_REQ_STEAL_UPDATE.
>
> Cleaning up and documenting the MMU related requests is on my todo list, but the
> immediate fix should be tiny and I can do my cleanups on top.
>
> I believe the minimal fix is:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 81ab3b8f22e5..b0072063f9bf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3072,6 +3072,9 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
> static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
> {
> ++vcpu->stat.tlb_flush;
> +
> + if (!tdp_enabled)
> + kvm_mmu_sync_roots(vcpu);
> static_call(kvm_x86_tlb_flush_guest)(vcpu);
> }
>
>