Re: [RFC PATCH 4/5] KVM: SVM: Recalculate nested RIPs after restoring REGS/SREGS
From: Sean Christopherson
Date: Fri Feb 20 2026 - 12:07:29 EST
On Thu, Feb 19, 2026, Yosry Ahmed wrote:
> > E.g. after patch 2, completely untested...
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index aec17c80ed73..6fc1b2e212d2 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -863,12 +863,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>
> Above the context lines we have:
>
> /*
> * next_rip is consumed on VMRUN as the return address pushed on the
> * stack for injected soft exceptions/interrupts. If nrips is exposed
> * to L1, take it verbatim from vmcb12. If nrips is supported in
> * hardware but not exposed to L1, stuff the actual L2 RIP to emulate
> * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
> * prior to injecting the event).
> */
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
> else if (boot_cpu_has(X86_FEATURE_NRIPS))
> vmcb02->control.next_rip = vmcb12_rip;
>
> The same bug affects vmcb02->control.next_rip when the guest doesn't
> have NRIPS. I think we don't want to move part of the vmcb02
> initialization before VMRUN too. We can keep the initialization here and
> overwrite it before VMRUN if needed, but that's just also ugh..
Aha! I knew I was missing something, but I couldn't quite get my brain to figure
out what.
I don't have a super strong preference as to copying the code or moving it
wholesale. Though I would say if the pre-VMRUN logic is _identical_ (and I think
it is?), then we move it, and simply update the comment in
nested_vmcb02_prepare_control() to call that out.
> > svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> > if (is_evtinj_soft(vmcb02->control.event_inj)) {
> > svm->soft_int_injected = true;
> > - svm->soft_int_csbase = vmcb12_csbase;
> > - svm->soft_int_old_rip = vmcb12_rip;
> > +
> > if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > svm->soft_int_next_rip = svm->nested.ctl.next_rip;
>
> Why not move this too?
For the same reason I think we should keep
if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
vmcb02->control.next_rip = svm->nested.ctl.next_rip;
where it is. When NRIPS is exposed to the guest, the incoming nested state is
the one and only source of truth. By keeping the code different, we'd effectively
be documenting that the host.NRIPS+!guest.NRIPS case is the anomaly.
> > - else
> > - svm->soft_int_next_rip = vmcb12_rip;
> > }
> >
> > /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 8f8bc863e214..358ec940ffc9 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4322,6 +4322,14 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> > return EXIT_FASTPATH_EXIT_USERSPACE;
> > }
> >
> > + if (is_guest_mode(vcpu) && svm->nested.nested_run_pending &&
> > + svm->soft_int_injected) {
> > + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> > + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> > + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> > + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> > + }
> > +
>
> I generally dislike adding more is_guest_mode() stuff in svm_vcpu_run(),
> maybe we can refactor them later to pre-run and post-run nested
> callbacks? Anyway, not a big deal, definitely an improvement over the
> current patch assuming we can figure out how to fix next_rip.
I don't love it either, but I want to (a) avoid unnecessarily overwriting the
fields, e.g. if KVM never actually does VMRUN and (b) minimize the probability
of consuming a bad RIP.
In practice, I would expect the nested_run_pending check to be correctly predicted
the overwhelming majority of the time, i.e. I don't anticipate performance issues
due to putting the code in the hot path.
If we want to try and move the update out of svm_vcpu_run(), then we shouldn't
need generic pre/post callbacks for nested, svm_prepare_switch_to_guest() is the
right fit.