Re: [PATCH 3/4] KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
From: Sean Christopherson
Date: Wed May 27 2026 - 19:53:35 EST
On Wed, May 27, 2026, Paolo Bonzini wrote:
> When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs
> in mmu->pdptrs[] can hold stale or wrong values after nested
> transitions and across migration restore, because both
> nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh
> PDPTRs on the !nested_npt path.
>
> The user-visible bug is on migration restore of an L2 running with nested
> NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than
> KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR
> marked as available, and kvm_pdptr_read() will use a stale translation
> that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages()
> runs on first KVM_RUN but skips the refresh because nested_npt_enabled()
> is true. The CPU itself reads L2's PDPTRs correctly from memory via
> L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus
> cached values.
>
> Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no
> VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are
> the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply
> reloads them from memory via load_pdptrs(). Clearing the avail
> bit (and the dirty bit because !avail/dirty is invalid) to force
> a reload when PDPTRs as needed fixes the bug.
>
> Do the same for nested_svm_load_cr3()'s nested_npt branch, so that
> the invariant "PDPTRs need reloading" is handled similarly for both
> immediate and deferred loading.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> arch/x86/kvm/kvm_cache_regs.h | 8 ++++++++
> arch/x86/kvm/svm/nested.c | 27 ++++++++++++++++++---------
> 2 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 2ae492ad6412..6bae5db5a54e 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -77,6 +77,14 @@ static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
> return test_bit(reg, vcpu->arch.regs_dirty);
> }
>
> +static inline void kvm_register_mark_for_reload(struct kvm_vcpu *vcpu,
> + enum kvm_reg reg)
I would rather use kvm_clear_available_registers() than add yet another API,
which isn't even a good fit here since SVM never expects the PDPTRs to be dirty.
Though I think it's a moot point, because nSVM should be clearing *all*
lazy-loaded registers. It just so happens that PDPTRs are the only such "register".
I haven't checked to see if this would actually be correct, I'm just mimicking
the nVMX code. But conceptually, I think we want something like so:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e74fcde6155e..0c6ab00766b1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1303,6 +1303,8 @@ void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
{
svm->current_vmcb = target_vmcb;
svm->vmcb = target_vmcb->ptr;
+
+ kvm_clear_available_registers(&svm->vcpu, SVM_REGS_LAZY_LOAD_SET);
}
static int svm_vcpu_precreate(struct kvm *kvm)