Re: [PATCH 1/3] KVM: nVMX: unwind PDPTR load if processor triggers a nested VMFail
From: Sean Christopherson
Date: Mon Jun 08 2026 - 23:33:09 EST
On Thu, Jun 04, 2026, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 4690a4d23709..d612a5d071fc 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4947,6 +4947,7 @@ static inline u64 nested_vmx_get_vmcs01_guest_efer(struct vcpu_vmx *vmx)
>
> static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
> {
> + enum vm_entry_failure_code ignored;
> struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct vmx_msr_entry g, h;
> @@ -4984,20 +4985,19 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
> vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW));
>
> nested_ept_uninit_mmu_context(vcpu);
> - vcpu->arch.cr3 = vmcs_readl(GUEST_CR3);
> - kvm_register_mark_available(vcpu, VCPU_REG_CR3);
>
> /*
> - * Use ept_save_pdptrs(vcpu) to load the MMU's cached PDPTRs
> - * from vmcs01 (if necessary). The PDPTRs are not loaded on
> - * VMFail, like everything else we just need to ensure our
> - * software model is up-to-date.
> + * Now that nested EPT has been disabled, load the MMU's CR3 and
> + * possibly PDPTRs from vmcs01 (if necessary). This should not
> + * happen for VMFail, but we get here if the check was caught by
> + * the processor and therefore the guest CR3 was loaded prematurely.
> */
> + kvm_mmu_unload(vcpu);
> + if (nested_vmx_load_cr3(vcpu, vmcs_readl(GUEST_CR3), false, !enable_ept, &ignored))
> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
This isn't quite correct either. I mean, none of this is architecturally correct,
but this is less correct than the other incorrect code here :-)
To do this "right", KVM should snapshot the PDPTRs and shove them into the MMU,
without touching guest memory.
On a very related topic, I have a patch to stash CR3 in software instead of
abusing vmcs01.GUEST_CR3, as KVM fails to restore vmcs01.GUEST_CR3 to its proper
state if nested_vmx_enter_non_root_mode() bails after clobbering vmcs01.GUEST_CR3,
but before loading guest state. We could probably do the same thing for PDPTRs?
https://lore.kernel.org/all/20260603223418.1720035-3-seanjc@xxxxxxxxxx
> if (enable_ept && is_pae_paging(vcpu))
> ept_save_pdptrs(vcpu);
>
> - kvm_mmu_reset_context(vcpu);
> -
> /*
> * This nasty bit of open coding is a compromise between blindly
> * loading L1's MSRs using the exit load lists (incorrect emulation
> --
> 2.52.0
>
>