Re: [PATCH] KVM: nSVM: Snapshot vmcb12 save.rip to prevent TOCTOU race

From: Sean Christopherson

Date: Wed Apr 01 2026 - 13:03:09 EST

+Yosry's email used for upstream stuff

On Thu, Mar 26, 2026, 홍길동 wrote:
> Hi all,
>
> Following Greg's suggestion to turn the proposed fix into a real patch,
> here is a minimal fix for the vmcb12->save.rip TOCTOU race in KVM's
> nested SVM implementation.
>
> Background
> ----------
>
> The CVE-2021-29657 fix introduced nested_copy_vmcb_save_to_cache() to
> snapshot vmcb12 fields before validation and use, preventing a racing L1
> vCPU from modifying vmcb12 between check and use. However, the save area
> cache deliberately excluded rip, rsp, and rax -- only efer, cr0, cr3,
> cr4, dr6, and dr7 are snapshotted.
>
> As a result, vmcb12->save.rip is still read three separate times from
> the live guest-mapped HVA pointer during a single nested VMRUN:
>
> 1) enter_svm_guest_mode() passes vmcb12->save.rip to
> nested_vmcb02_prepare_control(), where it is stored in
> svm->soft_int_old_rip, svm->soft_int_next_rip, and
> vmcb02->control.next_rip
>
> 2) nested_vmcb02_prepare_save() calls
> kvm_rip_write(vcpu, vmcb12->save.rip), setting the KVM-internal
> vCPU register state
>
> 3) nested_vmcb02_prepare_save() then does
> vmcb02->save.rip = vmcb12->save.rip, setting the hardware VMCB02
> save areaq
>
> Since vmcb12 is mapped via kvm_vcpu_map() as a direct HVA into guest
> physical memory with no write protection, a concurrent L1 vCPU can
> modify vmcb12->save.rip between these reads, producing a three-way RIP
> inconsistency. This is the save-area analog of CVE-2021-29657.
>
> The inconsistency is particularly dangerous when combined with soft

Eh, I wouldn't call this dangerous per se. Problematic, sure, but AFAICT the
host is never at risk. My official stance is that any panics due to KVM WARNs
when running with panic_on_warn=1 are NOT considered KVM DoS issues.

KVM WARNs are 100% worth fixing, especially if they're guest- and/or user-triggerable,
but the WARNs themselves aren't security/DoS issues, because in my very strong
opinion, allowing use of /dev/kvm with panic_on_warn=1 when the platform owner
cares about host uptime is user/admin error.

> interrupt injection (event_inj with TYPE_SOFT): KVM records
> soft_int_old_rip from read #1 but the vCPU state and hardware VMCB
> reflect reads #2 and #3 respectively. If interrupt delivery faults,
> svm_complete_interrupts() uses the stale soft_int_old_rip to
> reconstruct pre-injection state, which no longer matches reality.

None of this matches reality though, in the sense that the instant L1 mucks with
vmcb12->save.rip while VMRUN is in-flight, all bets are off. I.e. from an
architectural perspective, KVM doesn't need to get anything "right", because
the L1 hypervisor has firmly triggered undefined behavior.

What exactly goes wrong? The changelog mentions a WARN, and at glance, the only
one that seems relevant is this one in kvm_requeue_exception():

WARN_ON_ONCE(kvm_is_exception_pending(vcpu));

> I am aware of Yosry Ahmed's larger patch series (v3-v6) that
> reworks the entire vmcb12 caching architecture and would subsume
> this fix. However, that series is still under review and has not
> yet been merged. This patch is a minimal, self-contained fix that
> can be applied immediately to close the TOCTOU window on rip, rsp,
> and rax.
>
> Fix
> ---
>
> Add rip, rsp, and rax to struct vmcb_save_area_cached, snapshot them
> in __nested_copy_vmcb_save_to_cache(), and replace all direct reads
> of vmcb12->save.{rip,rsp,rax} with reads from the cached copy. This
> ensures all consumers within a single nested VMRUN see consistent
> register values.

What is actually visibliy problematic?

Assuming the worst case scenario is a WARN, then I'm very strongly inclined to
either (a) not apply this patch at all and instead wait for Yosry's full series,
or (b) have Yosry slot in the most minimal fix (e.g. for just RIP) in a stable@
friendly location in his series.

There are many, many nSVM issues that need to be fixed, many of which are functional
problems for well-behaved setups. For me, those are by far the priority. I also
want to fix the a guest-triggerable WARN_ON_ONCE(), but it's not urgent, and not
something I want to spend a lot of effort on with respect to providing an LTS-friendly
commit (though if we can get one cheaply, that'd be great).