Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS

From: Yosry Ahmed

Date: Wed Feb 18 2026 - 19:26:59 EST


> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index db3f393192d9..35fe1d337273 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12112,6 +12112,8 @@ static void __set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
> > kvm_rip_write(vcpu, regs->rip);
> > kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED);
> >
> > + kvm_x86_call(post_user_set_regs)(vcpu);
>
> I especially don't love this callback. Aside from adding a new kvm_x86_ops hook,
> I don't like that _any_ CS change triggers a fixup, whereas only userspace writes
> to RIP trigger a fixup. That _should_ be a moot point, because neither CS nor RIP
> should change while nested_run_pending is true, but I dislike the asymmetry.
>
> I was going to suggest we instead react to RIP being dirty, but what if we take
> it a step further? Somewhat of a crazy idea, but what happens if we simply wait
> until just before VMRUN to set soft_int_csbase, soft_int_old_rip, and
> soft_int_next_rip (when the guest doesn't have NRIPS)?

I generally like this idea. I thought about it for a moment but was
worried about how much of a behavioral change this introduces, but that
was probably before I convinced myself the problem only exists with
nested_run_pending.

That being said..

>
> E.g. after patch 2, completely untested...
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index aec17c80ed73..6fc1b2e212d2 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,

Above the context lines we have:

/*
* next_rip is consumed on VMRUN as the return address pushed on the
* stack for injected soft exceptions/interrupts. If nrips is exposed
* to L1, take it verbatim from vmcb12. If nrips is supported in
* hardware but not exposed to L1, stuff the actual L2 RIP to emulate
* what a nrips=0 CPU would do (L1 is responsible for advancing RIP
* prior to injecting the event).
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
else if (boot_cpu_has(X86_FEATURE_NRIPS))
vmcb02->control.next_rip = vmcb12_rip;

The same bug affects vmcb02->control.next_rip when the guest doesn't
have NRIPS. I think we don't want to move part of the vmcb02
initialization before VMRUN too. We can keep the initialization here and
overwrite it before VMRUN if needed, but that's just also ugh..

> svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> svm->soft_int_injected = true;
> - svm->soft_int_csbase = vmcb12_csbase;
> - svm->soft_int_old_rip = vmcb12_rip;
> +
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->soft_int_next_rip = svm->nested.ctl.next_rip;

Why not move this too?

> - else
> - svm->soft_int_next_rip = vmcb12_rip;
> }
>
> /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8f8bc863e214..358ec940ffc9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> return EXIT_FASTPATH_EXIT_USERSPACE;
> }
>
> + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> + svm->soft_int_injected) {
> + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> + }
> +

I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
maybe we can refactor them later to pre-run and post-run nested
callbacks? Anyway, not a big deal, definitely an improvement over the
current patch assuming we can figure out how to fix next_rip.

> sync_lapic_to_cr8(vcpu);
>
> if (unlikely(svm->asid != svm->vmcb->control.asid)) {