Re: [PATCH v3 2/4] KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN
From: Sean Christopherson
Date: Wed Jan 19 2022 - 12:40:28 EST
On Wed, Jan 19, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> >> @@ -313,6 +335,20 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> >>
> >> __kvm_update_cpuid_runtime(vcpu, e2, nent);
> >> + /*
> >> + * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> >> + * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> >> + * tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
> >> + * faults due to reusing SPs/SPTEs. In practice no sane VMM mucks with
> >> + * the core vCPU model on the fly. It would've been better to forbid any
> >> + * KVM_SET_CPUID{,2} calls after KVM_RUN altogether but unfortunately
> >> + * some VMMs (e.g. QEMU) reuse vCPU fds for CPU hotplug/unplug and do
> >> + * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
> >> + * whether the supplied CPUID data is equal to what's already set.
> >
> > This is misleading/wrong. KVM_RUN isn't the only problematic ioctl(),
>
> Well, it wasn't me who wrote the comment about KVM_RUN :-) My addition
> can be improved of course.
Don't^W^W^W shoot the messenger? :-)
> > it's just the one that we decided to use to detect that userspace is
> > being stupid. And forbidding KVM_SET_CPUID after KVM_RUN (or even all
> > problematic ioctls()) wouldn't solve problem as providing different
> > CPUID configurations for vCPUs in a VM will also cause the MMU to fall
> > on its face.
>
> True, but how do we move forward? We can either let userspace do stupid
> things and (potentially) create hard-to-debug problems or we try to
> cover at least some use-cases with checks (like the one we introduce
> here).
I completely agree, and if this were an internal API or a KVM module param I
would be jumping all over the idea of restricing how it can be used. What I don't
like is bolting on restrictions to a set of ioctl()s that have been in use for years.
> Different CPUID configurations for different vCPUs is actually an
> interesting case. It makes me (again) think about the
> allowlist/blocklist approaches: we can easily enhance the
> 'vcpu->arch.last_vmentry_cpu != -1' check below and start requiring
> CPUIDs to [almost] match. The question then is how to change CPUID for a
> multi-vCPU guest as it will become effectively forbidden. BTW, is there
> a good use-case for changing CPUIDs besides testing purposes?
No idea. That's a big reason for my concern; we've really only got input from
QEMU, and there are plenty of users beyond QEMU.
> >> + if (vcpu->arch.last_vmentry_cpu != -1)
> >> + return kvm_cpuid_check_equal(vcpu, e2, nent);
> >
> > And technically, checking last_vmentry_cpu doesn't forbid changing CPUID after
> > KVM_RUN, it forbids changing CPUID after successfully entering the guest (or
> > emulating instructions on VMX).
> >
> > I realize I'm being very pedantic, as a well-intended userspace is obviously not
> > going to change CPUID after -EINTR or whatever. But I do want to highlight that
> > this approach is by no means bulletproof, and that what is/isn't allowed with
> > respect to guest CPUID isn't necessarily associated with what is/isn't "safe".
> > In other words, this check doesn't guarantee that userspace can't misuse KVM_SET_CPUID,
> > and on the flip side it disallows using KVM_SET_CPUID in ways that are perfectly ok
> > (if userspace is careful and deliberate).
>
> All true but I don't see a 'bulletproof' approach here unless we start
> designing new KVM API for userspace and I don't think the problem here
> is a good enough justification for that.
Yeah, agreed.
> Another approach would be to name the "don't change CPUIDs after KVM_RUN at
> will" comment in the code a good enough sentinel and hope that no real world
> userspace actually does such things.
I'm ok with that, so long as the KVM code is kept simple (a single memcmp()
qualifies) and we are quick to revert the whole thing if it turns out there's an
existing user and/or valid use case.