Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS

From: Sean Christopherson

Date: Wed Feb 18 2026 - 19:13:42 EST

On Thu, Feb 12, 2026, Yosry Ahmed wrote:
> In the save/restore path, if KVM_SET_NESTED_STATE is performed before
> restoring REGS and/or SREGS , the values of CS and RIP used to
> initialize the vmcb02's NextRIP and soft interrupt tracking RIPs are
> incorrect.
>
> Recalculate them up after CS is set, or REGS are restored. This is only
> needed when a nested run is pending during restore. After L2 runs for
> the first time, any soft interrupts injected by L1 are already delivered
> or tracked by KVM separately for re-injection, so the CS and RIP values
> are no longer relevant.
>
> If KVM_SET_NESTED_STATE is performed after both REGS and SREGS are
> restored, it will just overwrite the fields.

Apparently I suggested this general idea, but ugh. :-)

> static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
> {
> kvm_register_mark_available(vcpu, reg);
> @@ -1826,6 +1844,8 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
> if (seg == VCPU_SREG_SS)
> /* This is symmetric with svm_get_segment() */
> svm->vmcb->save.cpl = (var->dpl & 3);
> + else if (seg == VCPU_SREG_CS)
> + svm_fixup_nested_rips(vcpu);
>
> vmcb_mark_dirty(svm->vmcb, VMCB_SEG);
> }
> @@ -5172,6 +5192,7 @@ struct kvm_x86_ops svm_x86_ops __initdata = {
> .get_rflags = svm_get_rflags,
> .set_rflags = svm_set_rflags,
> .get_if_flag = svm_get_if_flag,
> + .post_user_set_regs = svm_fixup_nested_rips,
>
> .flush_tlb_all = svm_flush_tlb_all,
> .flush_tlb_current = svm_flush_tlb_current,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index db3f393192d9..35fe1d337273 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> kvm_rip_write(vcpu, regs->rip);
> kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
>
> + kvm_x86_call(post_user_set_regs)(vcpu);

I especially don't love this callback. Aside from adding a new kvm_x86_ops hook,
I don't like that _any_ CS change triggers a fixup, whereas only userspace writes
to RIP trigger a fixup. That _should_ be a moot point, because neither CS nor RIP
should change while nested_run_pending is true, but I dislike the asymmetry.

I was going to suggest we instead react to RIP being dirty, but what if we take
it a step further? Somewhat of a crazy idea, but what happens if we simply wait
until just before VMRUN to set soft_int_csbase, soft_int_old_rip, and
soft_int_next_rip (when the guest doesn't have NRIPS)?

E.g. after patch 2, completely untested...

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index aec17c80ed73..6fc1b2e212d2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
if (is_evtinj_soft(vmcb02->control.event_inj)) {
svm->soft_int_injected = true;
- svm->soft_int_csbase = vmcb12_csbase;
- svm->soft_int_old_rip = vmcb12_rip;
+
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
svm->soft_int_next_rip = svm->nested.ctl.next_rip;
- else
- svm->soft_int_next_rip = vmcb12_rip;
}

/* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..358ec940ffc9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
return EXIT_FASTPATH_EXIT_USERSPACE;
}

+ if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
+ svm->soft_int_injected) {
+ svm->soft_int_csbase = svm->vmcb->save.cs.base;
+ svm->soft_int_old_rip = kvm_rip_read(vcpu);
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
+ svm->soft_int_next_rip = kvm_rip_read(vcpu);
+ }
+
sync_lapic_to_cr8(vcpu);

if (unlikely(svm->asid != svm->vmcb->control.asid)) {