[PATCH 5.10 066/126] KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit

From: Greg Kroah-Hartman
Date: Mon Apr 05 2021 - 05:15:22 EST


From: Paolo Bonzini <pbonzini@xxxxxxxxxx>

commit 3c346c0c60ab06a021d1c0884a0ef494bc4ee3a7 upstream.

Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared. This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.

Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).

Cc: stable@xxxxxxxxxxxxxxx
Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state")
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
arch/x86/kvm/svm/nested.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -251,6 +251,13 @@ static bool nested_vmcb_check_save(struc
struct kvm_vcpu *vcpu = &svm->vcpu;
bool vmcb12_lma;

+ /*
+ * FIXME: these should be done after copying the fields,
+ * to avoid TOC/TOU races. For these save area checks
+ * the possible damage is limited since kvm_set_cr0 and
+ * kvm_set_cr4 handle failure; EFER_SVME is an exception
+ * so it is force-set later in nested_prepare_vmcb_save.
+ */
if ((vmcb12->save.efer & EFER_SVME) == 0)
return false;

@@ -396,7 +403,14 @@ static void nested_prepare_vmcb_save(str
svm->vmcb->save.gdtr = vmcb12->save.gdtr;
svm->vmcb->save.idtr = vmcb12->save.idtr;
kvm_set_rflags(&svm->vcpu, vmcb12->save.rflags);
- svm_set_efer(&svm->vcpu, vmcb12->save.efer);
+
+ /*
+ * Force-set EFER_SVME even though it is checked earlier on the
+ * VMCB12, because the guest can flip the bit between the check
+ * and now. Clearing EFER_SVME would call svm_free_nested.
+ */
+ svm_set_efer(&svm->vcpu, vmcb12->save.efer | EFER_SVME);
+
svm_set_cr0(&svm->vcpu, vmcb12->save.cr0);
svm_set_cr4(&svm->vcpu, vmcb12->save.cr4);
svm->vmcb->save.cr2 = svm->vcpu.arch.cr2 = vmcb12->save.cr2;
@@ -1207,6 +1221,8 @@ static int svm_set_nested_state(struct k
*/
if (!(save->cr0 & X86_CR0_PG))
goto out_free;
+ if (!(save->efer & EFER_SVME))
+ goto out_free;

/*
* All checks done, we can enter guest mode. L1 control fields