[PATCH 5.12 075/700] KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified

From: Greg Kroah-Hartman
Date: Mon Jul 12 2021 - 03:29:28 EST


From: Sean Christopherson <seanjc@xxxxxxxxxx>

commit 49c6f8756cdffeb9af1fbcb86bacacced26465d7 upstream.

Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers. Despite the efforts of commit de3ccd26fafc
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role. E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.

The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.

Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...

Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint. The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.

In other words, resetting the MMU after a CPUID update is largely a
superficial fix. But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported. For cases where KVM botches guest behavior, the damage is
limited to that guest. But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.

Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
Message-Id: <20210622175739.3610207-7-seanjc@xxxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 6 +++---
arch/x86/kvm/mmu/mmu.c | 12 ++++++++++++
3 files changed, 16 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1433,6 +1433,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask
u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
u64 acc_track_mask, u64 me_mask);

+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot,
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -185,10 +185,10 @@ static void kvm_vcpu_after_set_cpuid(str
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);

/*
- * Except for the MMU, which needs to be reset after any vendor
- * specific adjustments to the reserved GPA bits.
+ * Except for the MMU, which needs to do its thing any vendor specific
+ * adjustments to the reserved GPA bits.
*/
- kvm_mmu_reset_context(vcpu);
+ kvm_mmu_after_set_cpuid(vcpu);
}

static int is_efer_nx(void)
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4846,6 +4846,18 @@ kvm_mmu_calc_root_page_role(struct kvm_v
return role.base;
}

+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Invalidate all MMU roles to force them to reinitialize as CPUID
+ * information is factored into reserved bit calculations.
+ */
+ vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
+ kvm_mmu_reset_context(vcpu);
+}
+
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
{
kvm_mmu_unload(vcpu);