Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS

From: Yosry Ahmed

Date: Fri Feb 20 2026 - 15:28:17 EST

> > > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> > > if (is_evtinj_soft(vmcb02->control.event_inj)) {
> > > svm->soft_int_injected = true;
> > > - svm->soft_int_csbase = vmcb12_csbase;
> > > - svm->soft_int_old_rip = vmcb12_rip;
> > > +
> > > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> >
> > Why not move this too?
>
> For the same reason I think we should keep
>
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
>
> where it is. When NRIPS is exposed to the guest, the incoming nested state is
> the one and only source of truth. By keeping the code different, we'd effectively
> be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.

I see, makes sense. I like the fact that we should be able to completely
drop vmcb12_rip and vmcb12_csbase with this (unless we want to start
using it for the bus_lock_rip check), which will also remove the need
for patch 2.

I guess the only hiccup will be patch 1. We'll end up with this code in
nested_vmcb02_prepare_control():

if boot_cpu_has(X86_FEATURE_NRIPS)) {
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending)
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
}

, and this code pre-VMRUN:

if boot_cpu_has(X86_FEATURE_NRIPS)) {
if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) &&
svm->nested.nested_run_pending)
vmcb02->control.next_rip = kvm_rip_read(vcpu);
}

It seems a bit fragile that the 'if' is somewhere and the 'else' (or the
opposite condition) is somewhere else. They could get out of sync. Maybe
a helper will make this better:

/* Huge comment */
bool nested_svm_use_vmcb12_next_rip(struct kvm_vcpu *vcpu)
{
return guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending;
}

or maybe the name makes more sense as the negative:
nested_svm_use_rip_as_next_rip()?

I don't like both names..

Aha! Maybe it's actually better to have the helper set the NextRip
directly?

/* Huge comment */
u64 nested_vmcb02_update_next_rip(struct kvm_vcpu *vcpu)
{
u64 next_rip;

if (WARN_ON_ONCE(svm->vmcb != svm->nested.vmcb02.ptr))
return;

if (!boot_cpu_has(X86_FEATURE_NRIPS))
return;

if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS) ||
!svm->nested.nested_run_pending)
next_rip = svm->nested.ctl.next_rip;
else
next_rip = kvm_rip_read(vcpu);

svm->vmcb->control.next_rip = next_rip;
}

Then, we just call this from nested_vmcb02_prepare_control() and
svm_vcpu_run() (with the soft IRQ stuff). In some cases we'll put the
wrong thing in vmcb02 and then fix it up later, but I think that's fine
(that's what the patch is doing anyway).

However, we lose the fact that the whole thing is guarded by
nested_run_pending, so performance in svm_vcpu_run() could suffer. We
could leave it all guarded by nested_run_pending, as
nested_vmcb02_update_next_rip() should do nothing in svm_vcpu_run()
otherwise, but then the caller is depending on implementation details of
the helper.

Maybe just put it in svm_prepare_switch_to_guest() to begin with and not
guard it with nested_run_pending?

>
> > > - else
> > > - svm->soft_int_next_rip = vmcb12_rip;
> > > }
> > >
> > > /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 8f8bc863e214..358ec940ffc9 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> > > return EXIT_FASTPATH_EXIT_USERSPACE;
> > > }
> > >
> > > + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> > > + svm->soft_int_injected) {
> > > + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > > + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > > + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > > + }
> > > +
> >
> > I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
> > maybe we can refactor them later to pre-run and post-run nested
> > callbacks? Anyway, not a big deal, definitely an improvement over the
> > current patch assuming we can figure out how to fix next_rip.
>
> I don't love it either, but I want to (a) avoid unnecessarily overwriting the
> fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability
> of consuming a bad RIP.
>
> In practice, I would expect the nested_run_pending check to be correctly predicted
> the overwhelming majority of the time, i.e. I don't anticipate performance issues
> due to putting the code in the hot path.
>
> If we want to try and move the update out of svm_vcpu_run(), then we shouldn't
> need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the
> right fit.

I prefer putting it in svm_prepare_switch_to_guest() to begin with,
because it gives us more flexibility (see above).