Re: [PATCH v6 16/31] KVM: nSVM: Unify handling of VMRUN failures with proper cleanup

From: Yosry Ahmed

Date: Wed Feb 25 2026 - 17:44:58 EST


> @@ -939,22 +939,19 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> vmcb12->control.intercepts[INTERCEPT_WORD4],
> vmcb12->control.intercepts[INTERCEPT_WORD5]);
>
> -
> svm->nested.vmcb12_gpa = vmcb12_gpa;
>
> WARN_ON(svm->vmcb == svm->nested.vmcb02.ptr);
>
> enter_guest_mode(vcpu);
>
> + if (!nested_vmcb_check_save(vcpu, &svm->nested.save) ||
> + !nested_vmcb_check_controls(vcpu, &svm->nested.ctl))
> + return -EINVAL;
> +
> if (nested_npt_enabled(svm))
> nested_svm_init_mmu_context(vcpu);
>
> - nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
> -
> - svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> - nested_vmcb02_prepare_save(svm, vmcb12);
> -
> ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
> nested_npt_enabled(svm), from_vmrun);
> if (ret)

I think the above is wrong (as pointed out by an internal AI bot).
nested_vmcb02_prepare_save() also initializes architectural state to
L2's, moving it after nested_svm_load_cr3() means that
nested_svm_load_cr3() will use L1's architectural state (e.g. in
is_pae_paging(), but more importantly in kvm_init_mmu()).

The only clean-ish solution I can think of is to refactor loading L2's
architectural state out of nested_vmcb02_prepare_save(), and keep it
before nested_svm_load_cr3(). Then, also refactor restoring L1's
architectural state out of nested_svm_vmexit() and also do it in
nested_svm_vmrun_error_vmexit().

We can probably move it to __nested_svm_vmexit(), assuming nothing
else in nested_svm_vmexit() relies on it.

That being said, I am open to suggestions for better options.

> @@ -966,6 +963,17 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> return ret;
> }
>
> + /*
> + * Any VMRUN failure needs to happen before this point, such that the
> + * nested #VMEXIT is injected properly by nested_svm_vmrun_error_vmexit().
> + */
> +
> + nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
> +
> + svm_switch_vmcb(svm, &svm->nested.vmcb02);
> + nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> + nested_vmcb02_prepare_save(svm, vmcb12);
> +

Another problem here, hidden above the context lines is a call to
nested_svm_merge_msrpm(), which updates msrpm_base_pa in the active
VMCB, which should be vmcb02, but will be vmcb01 because we moved the
VMCB switch below it.

I think the easiest thing to do here is make nested_svm_merge_msrpm()
explicitly update msrpm_base_pa in vmcb02, I think this is probably
better anyway.