Re: [PATCH v3 7/8] KVM: nSVM: Delay setting soft IRQ RIP tracking fields until vCPU run
From: Yosry Ahmed
Date: Wed Mar 04 2026 - 12:52:07 EST
On Tue, Feb 24, 2026 at 5:00 PM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
>
> In the save+restore path, when restoring nested state, the values of RIP
> and CS base passed into nested_vmcb02_prepare_control() are mostly
> incorrect. They are both pulled from the vmcb02. For CS base, the value
> is only correct if system regs are restored before nested state. The
> value of RIP is whatever the vCPU had in vmcb02 before restoring nested
> state (zero on a freshly created vCPU).
>
> Instead, take a similar approach to NextRIP, and delay initializing the
> RIP tracking fields until shortly before the vCPU is run, to make sure
> the most up-to-date values of RIP and CS base are used regardless of
> KVM_SET_SREGS, KVM_SET_REGS, and KVM_SET_NESTED_STATE's relative
> ordering.
>
> Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE")
> CC: stable@xxxxxxxxxxxxxxx
> Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> Signed-off-by: Yosry Ahmed <yosry@xxxxxxxxxx>
> ---
> arch/x86/kvm/svm/nested.c | 17 ++++++++---------
> arch/x86/kvm/svm/svm.c | 10 ++++++++++
> 2 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dcd4a8eb156f2..4499241b4e401 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -742,9 +742,7 @@ static bool is_evtinj_nmi(u32 evtinj)
> return type == SVM_EVTINJ_TYPE_NMI;
> }
>
> -static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> - unsigned long vmcb12_rip,
> - unsigned long vmcb12_csbase)
> +static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
> {
> u32 int_ctl_vmcb01_bits = V_INTR_MASKING_MASK;
> u32 int_ctl_vmcb12_bits = V_TPR_MASK | V_IRQ_INJECTION_BITS_MASK;
> @@ -856,14 +854,15 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> vmcb02->control.next_rip = svm->nested.ctl.next_rip;
>
> svm->nmi_l1_to_l2 = is_evtinj_nmi(vmcb02->control.event_inj);
> +
> + /*
> + * soft_int_csbase, soft_int_old_rip, and soft_int_next_rip (if L1
> + * doesn't have NRIPS) are initialized later, before the vCPU is run.
> + */
> if (is_evtinj_soft(vmcb02->control.event_inj)) {
> svm->soft_int_injected = true;
> - svm->soft_int_csbase = vmcb12_csbase;
> - svm->soft_int_old_rip = vmcb12_rip;
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->soft_int_next_rip = svm->nested.ctl.next_rip;
> - else
> - svm->soft_int_next_rip = vmcb12_rip;
> }
>
> /* LBR_CTL_ENABLE_MASK is controlled by svm_update_lbrv() */
> @@ -961,7 +960,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
> nested_svm_copy_common_state(svm->vmcb01.ptr, svm->nested.vmcb02.ptr);
>
> svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, vmcb12->save.rip, vmcb12->save.cs.base);
> + nested_vmcb02_prepare_control(svm);
> nested_vmcb02_prepare_save(svm, vmcb12);
>
> ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
> @@ -1906,7 +1905,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> nested_copy_vmcb_control_to_cache(svm, ctl);
>
> svm_switch_vmcb(svm, &svm->nested.vmcb02);
> - nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip, svm->vmcb->save.cs.base);
> + nested_vmcb02_prepare_control(svm);
>
> /*
> * While the nested guest CR3 is already checked and set by
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ded4372f2d499..7948e601ea784 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3670,11 +3670,21 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
> * This is done here (as opposed to when preparing vmcb02) to use the
> * most up-to-date value of RIP regardless of the order of restoring
> * registers and nested state in the vCPU save+restore path.
> + *
> + * Simiarly, initialize svm->soft_int_* fields here to use the most
> + * up-to-date values of RIP and CS base, regardless of restore order.
> */
> if (is_guest_mode(vcpu) && svm->nested.nested_run_pending) {
> if (boot_cpu_has(X86_FEATURE_NRIPS) &&
> !guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> svm->vmcb->control.next_rip = kvm_rip_read(vcpu);
> +
> + if (svm->soft_int_injected) {
> + svm->soft_int_csbase = svm->vmcb->save.cs.base;
> + svm->soft_int_old_rip = kvm_rip_read(vcpu);
> + if (!guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
> + svm->soft_int_next_rip = kvm_rip_read(vcpu);
> + }
AI found a bug here. These fields will be left uninitialized if we
cancel injection before pre_svm_run() (e.g. due to
kvm_vcpu_exit_request()). I was going to suggest moving this to
pre-run, but this leaves a larger gap where RIP can be updated from
under us. Sean has a better fixup in progress.