Re: [PATCH 3/4] KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions

From: Paolo Bonzini

Date: Thu May 28 2026 - 04:44:16 EST


On 5/28/26 01:53, Sean Christopherson wrote:
I would rather use kvm_clear_available_registers() than add yet another API,
which isn't even a good fit here since SVM never expects the PDPTRs to be dirty.

Though I think it's a moot point, because nSVM should be clearing *all*
lazy-loaded registers. It just so happens that PDPTRs are the only such "register".

I haven't checked to see if this would actually be correct, I'm just mimicking
the nVMX code. But conceptually, I think we want something like so:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e74fcde6155e..0c6ab00766b1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1303,6 +1303,8 @@ void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
{
svm->current_vmcb = target_vmcb;
svm->vmcb = target_vmcb->ptr;
+
+ kvm_clear_available_registers(&svm->vcpu, SVM_REGS_LAZY_LOAD_SET);
}
static int svm_vcpu_precreate(struct kvm *kvm)

I actually started with something like that. nSVM should indeed be clearing all other lazy-loaded registers too (except there isn't any). I ended up doing it the other way for two reasons, both boiling down to PDPTRs being weird.

First, the PDPTRs are cached processor state but not a field of the VMCB; changing the VMCB should have no effect on them. For SVM, changes to their cache state are purely a result of writes to CR3 or CR4.PAE.

Thus it seemed more explicit to single out the PAE+nNPT case when loading CR3:

if (nested_npt)
kvm_register_mark_for_reload
else if (!load_pdptrs(...))
return -EINVAL;

instead of the slightly more opaque

if (!nested_npt && !load_pdptrs(...))
return -EINVAL;

Second, with nNPT it's not even entirely correct to cache them at all. Strictly speaking they should never even be marked available, that way you'd go through load_pdptrs() on every PAE memory access. Thus kvm_register_mark_for_reload() could also be tied to the CR4 write, but in any case not to the VMCB switch.

I'd rather keep it like this but yeah, it can be done either way. If there were other registers in the lazy load set, relying on svm_switch_vmcb() would make sense but you'd want to add a comment above the two instances of "if (!nested_npt && !load_pdptrs())".

Paolo