[PATCH 3/4] KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions
From: Paolo Bonzini
Date: Wed May 27 2026 - 08:07:18 EST
When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs
in mmu->pdptrs[] can hold stale or wrong values after nested
transitions and across migration restore, because both
nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh
PDPTRs on the !nested_npt path.
The user-visible bug is on migration restore of an L2 running with nested
NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than
KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR
marked as available, and kvm_pdptr_read() will use a stale translation
that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages()
runs on first KVM_RUN but skips the refresh because nested_npt_enabled()
is true. The CPU itself reads L2's PDPTRs correctly from memory via
L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus
cached values.
Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no
VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are
the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply
reloads them from memory via load_pdptrs(). Clearing the avail
bit (and the dirty bit because !avail/dirty is invalid) to force
a reload when PDPTRs as needed fixes the bug.
Do the same for nested_svm_load_cr3()'s nested_npt branch, so that
the invariant "PDPTRs need reloading" is handled similarly for both
immediate and deferred loading.
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/kvm_cache_regs.h | 8 ++++++++
arch/x86/kvm/svm/nested.c | 27 ++++++++++++++++++---------
2 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 2ae492ad6412..6bae5db5a54e 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -77,6 +77,14 @@ static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
return test_bit(reg, vcpu->arch.regs_dirty);
}
+static inline void kvm_register_mark_for_reload(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ kvm_assert_register_caching_allowed(vcpu);
+ __clear_bit(reg, vcpu->arch.regs_avail);
+ __clear_bit(reg, vcpu->arch.regs_dirty);
+}
+
static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
enum kvm_reg reg)
{
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 3d1fd1776e19..aa5a1d8ea136 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -680,9 +680,12 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3)))
return -EINVAL;
- if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) &&
- CC(!load_pdptrs(vcpu, cr3)))
- return -EINVAL;
+ if (reload_pdptrs && is_pae_paging(vcpu)) {
+ if (nested_npt)
+ kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR);
+ else if (CC(!load_pdptrs(vcpu, cr3)))
+ return -EINVAL;
+ }
vcpu->arch.cr3 = cr3;
@@ -2055,15 +2058,21 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
if (WARN_ON(!is_guest_mode(vcpu)))
return true;
- if (!vcpu->arch.pdptrs_from_userspace &&
- !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu))
+ if (is_pae_paging(vcpu)) {
/*
- * Reload the guest's PDPTRs since after a migration
- * the guest CR3 might be restored prior to setting the nested
- * state which can lead to a load of wrong PDPTRs.
+ * After migration, CR3 may have been restored before
+ * KVM_SET_NESTED_STATE, so the PDPTR load into mmu->pdptrs[]
+ * may have treated CR3 as an L1 GPA. For nNPT, drop the
+ * cache so the next access reloads them with the proper
+ * nGPA translation. For !nNPT, reload eagerly unless userspace
+ * already supplied authoritative PDPTRs via KVM_SET_SREGS2.
*/
- if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
+ if (nested_npt_enabled(to_svm(vcpu)))
+ kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR);
+ else if (!vcpu->arch.pdptrs_from_userspace &&
+ CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
return false;
+ }
if (!nested_svm_merge_msrpm(vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
--
2.52.0