Re: [PATCH 4.19 167/211] KVM: x86: Manually calculate reserved bits when loading PDPTRS
From: Greg Kroah-Hartman
Date: Mon Nov 11 2019 - 13:08:26 EST
On Mon, Nov 11, 2019 at 09:57:19AM -0800, Sean Christopherson wrote:
> On Mon, Nov 11, 2019 at 06:48:59PM +0100, Greg Kroah-Hartman wrote:
> > On Mon, Nov 11, 2019 at 09:37:57AM -0800, Sean Christopherson wrote:
> > > On Mon, Nov 11, 2019 at 10:32:05AM +0100, Thomas Lamprecht wrote:
> > > > On 10/3/19 5:53 PM, Greg Kroah-Hartman wrote:
> > > > > From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > > > >
> > > > > commit 16cfacc8085782dab8e365979356ce1ca87fd6cc upstream.
> > > > >
> > > > > Manually generate the PDPTR reserved bit mask when explicitly loading
> > > > > PDPTRs. The reserved bits that are being tracked by the MMU reflect the
> > > > It seems that a backport of this to stable and distro kernels tickled out
> > > > some issue[0] for KVM Linux 64bit guests on older than about 8-10 year old
> > > > Intel CPUs[1].
> > >
> > > It manifests specifically when running with EPT disabled (no surprise
> > > there). Actually, it probably would reproduce simply with unrestricted
> > > guest disabled, but that's beside the point.
> > >
> > > The issue is a flawed PAE-paging check in kvm_set_cr3(), which causes KVM
> > > to incorrectly load PDPTRs in 64-bit mode and inject a #GP. It's a sneaky
> > > little bugger because the "if (is_long_mode() ..." makes it appear to be
> > > correct at first glance.
> > >
> > > if (is_long_mode(vcpu) &&
> > > (cr3 & rsvd_bits(cpuid_maxphyaddr(vcpu), 63)))
> > > return 1;
> > > else if (is_pae(vcpu) && is_paging(vcpu) && <--- needs !is_long_mode()
> > > !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
> > > return 1;
> > >
> > > With unrestricted guest, KVM doesn't intercept writes to CR3 and so doesn't
> > > trigger the buggy code. This doesn't fail upstream because the offending
> > > code was refactored to encapsulate the PAE checks in a single helper,
> > > precisely to avoid this type of headache.
> > >
> > > commit bf03d4f9334728bf7c8ffc7de787df48abd6340e
> > > Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > > Date: Thu Jun 6 18:52:44 2019 +0200
> > >
> > > KVM: x86: introduce is_pae_paging
> > >
> > > Checking for 32-bit PAE is quite common around code that fiddles with
> > > the PDPTRs. Add a function to compress all checks into a single
> > > invocation.
> > >
> > >
> > > Commit bf03d4f93347 ("KVM: x86: introduce is_pae_paging") doesn't apply
> > > cleanly to 4.19 or earlier because of the VMX file movement in 4.20. But,
> > > the revelant changes in x86.c do apply cleanly, and I've quadruple checked
> > > that the PAE checks in vmx.c are correct, i.e. applying the patch and
> > > ignoring the nested.c/vmx.c conflicts would be a viable lazy option.
> > >
> > > > Basically, booting this kernel as host, then running an KVM guest distro
> > > > or kernel fails it that guest kernel early in the boot phase without any
> > > > error or other log to serial console, earlyprintk.
> > >
> > > ...
> > >
> > > >
> > > > [0]: https://bugzilla.kernel.org/show_bug.cgi?id=205441
> > > > [1]: models tested as problematic are: intel core2duo E8500; Xeon E5420; so
> > > > westmere, conroe and that stuff. AFAICT anything from about pre-2010 which
> > > > has VMX support (i.e. is 64bit based)
> > >
> > > Note, not Westmere, which has EPT and unrestricted guest. Xeon E5420 is
> > > Harpertown, a.k.a. Penryn, the shrink of Conroe.
> >
> >
> > Thanks for figuring this out, can you send us a patch that we can apply
> > to fix this issue in the stable tree?
>
> Can do. A custom backport will be need for 4.20 and earlier, not 4.19 and
> earlier. I misremembered when we did the VMX refactoring.
>
> For 5.0, 5.1 and 5.2, commit bf03d4f93347 can be applied directly.
5.0, 5.1, and 5.2 are all long end-of-life, they are not getting any
updates and no one should be using them, so nothing to worry about
there.
Look at the releases page on kernel.org to see the currently active
trees.
thanks,
greg k-h