Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

From: Sean Christopherson
Date: Fri Oct 09 2020 - 11:31:05 EST


On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
> 09.10.2020 07:04, Sean Christopherson пишет:
> >>Hmm. But at least it was lying
> >>similarly on AMD and Intel CPUs. :)
> >>So I was able to reproduce the problems
> >>myself.
> >>Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> >>tests only?
> >For anything VMXE related, yes.
>
> What would be the expected behaviour on Intel, if it is set? Any difference
> with AMD?

On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'. Note,
'nested' is enabled by default on kernel 5.0 and later.

With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,
AMD's virtualization solution is called SVM (Secure Virtual Machine). KVM
doesn't support nesting VMX within SVM and vice versa.

> >>Then additional question.
> >>On old Intel CPUs we needed to set VMXE in guest to make it to work in
> >>nested-guest mode.
> >>Is it still needed even with your patches?
> >>Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> >>set VMXE for us itself, when needed?
> >I'm struggling to even come up with a theory as to how setting VMXE from
> >userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
> >anything.
> >
> >CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
> >the guest. But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
> >the guest's actual value (the guest sees a shadow value when it reads CR4).
> >
> >And unless I grossly misunderstand dosemu2, it's not doing anything related to
> >nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
> >should have absolutely zero impact.
> >
> >More than likely, VMXE was a red herring.
>
> Yes, it was. :( (as you can see from the end of the github thread)
>
>
> > Given that the reporter is also
> >seeing the same bug on bare metal after moving to kernel 5.4, odds are good
> >the issue is related to unrestricted_guest=n and has nothing to do with nVMX.
>
> But we do not use unrestricted guest.
> We use v86 under KVM.

Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
segmentation checks that apply if and only if unrestricted_guest=0. Long story
short, without a deep audit, it's basically impossible to rule out a dependency
on unrestricted guest since you're playing around with v86.

> The only other effect of setting VMXE was clearing VME. Which shouldn't
> affect anything either, right?

Hmm, clearing VME would mean that exceptions/interrupts within the guest would
trigger a switch out of v86 and into vanilla protected mode. v86 and PM have
different consistency checks, particularly for segmentation, so it's plausible
that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
guest state for v86.