Re: [PATCH 05/21] KVM: SEV: Lock all vCPUs when synchronzing VMSAs for SNP launch finish
From: Sean Christopherson
Date: Wed Apr 08 2026 - 14:43:24 EST
On Wed, Apr 08, 2026, Srikanth Aithal wrote:
> On 3/11/2026 5:18 AM, Sean Christopherson wrote:
> > Lock all vCPUs when synchronizing and encrypting VMSAs for SNP guests, as
> > allowing userspace to manipulate and/or run a vCPU while its state is being
> > synchronized would at best corrupt vCPU state, and at worst crash the host
> > kernel.
> >
> > Opportunistically assert that vcpu->mutex is held when synchronizing its
> > VMSA (the SEV-ES path already locks vCPUs).
> >
> > Fixes: ad27ce155566 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/svm/sev.c | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 5de36bbc4c53..c10c71608208 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -882,6 +882,8 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
> > u8 *d;
> > int i;
> > + lockdep_assert_held(&vcpu->mutex);
> > +
> > if (vcpu->arch.guest_state_protected)
> > return -EINVAL;
> > @@ -2456,6 +2458,10 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > if (kvm_is_vcpu_creation_in_progress(kvm))
> > return -EBUSY;
> > + ret = kvm_lock_all_vcpus(kvm);
> > + if (ret)
> > + return ret;
> > +
> > data.gctx_paddr = __psp_pa(sev->snp_context);
> > data.page_type = SNP_PAGE_TYPE_VMSA;
> > @@ -2465,12 +2471,12 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > ret = sev_es_sync_vmsa(svm);
> > if (ret)
> > - return ret;
> > + goto err;
> > /* Transition the VMSA page to a firmware state. */
> > ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
> > if (ret)
> > - return ret;
> > + goto err;
> > /* Issue the SNP command to encrypt the VMSA */
> > data.address = __sme_pa(svm->sev_es.vmsa);
> > @@ -2479,7 +2485,7 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > if (ret) {
> > snp_page_reclaim(kvm, pfn);
> > - return ret;
> > + goto err;
> > }
> > svm->vcpu.arch.guest_state_protected = true;
> > @@ -2494,6 +2500,10 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
> > }
> > return 0;
> > +
> > +err:
> > + kvm_unlock_all_vcpus(kvm);
> > + return ret;
/facepalm
With an assist from lockdep (see below), I forgot to actually unlock in the
*success* path. The failure manifested as a "guest" hang instead of a deadlock
by pure dumb luck: kvm_vcpu_ioctl() uses mutex_lock_killable() so killing QEMU
still works.
I'll squash this (assuming it fixes the problem you're seeing), and omit this
entire series from the initial 7.1 pull requests. If everything looks good, I'll
plan on sending a second pull request for this series after it's had more time to
soak in -next.
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2010b157e288..770f7dfc0e5c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2512,12 +2512,12 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
ret = sev_es_sync_vmsa(svm);
if (ret)
- goto err;
+ goto out;
/* Transition the VMSA page to a firmware state. */
ret = rmp_make_private(pfn, INITIAL_VMSA_GPA, PG_LEVEL_4K, sev->asid, true);
if (ret)
- goto err;
+ goto out;
/* Issue the SNP command to encrypt the VMSA */
data.address = __sme_pa(svm->sev_es.vmsa);
@@ -2526,7 +2526,7 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
if (ret) {
snp_page_reclaim(kvm, pfn);
- goto err;
+ goto out;
}
svm->vcpu.arch.guest_state_protected = true;
@@ -2540,9 +2540,7 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
svm_enable_lbrv(vcpu);
}
- return 0;
-
-err:
+out:
kvm_unlock_all_vcpus(kvm);
return ret;
}
> > }
> > static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
>
>
> I am seeing an SNP guest boot failure starting with linux-next tag
> next-20260406 [1].
>
> The SNP guest hangs during boot.
...
> There are no error messages on either the host or guest serial console when
> this happens.
WARNING: Nested lock was not taken
7.0.0-smp--36ad607330fb-snp #112 Tainted: G U W O
----------------------------------
qemu/39235 is trying to lock:
ffff8d0e590c00b0 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_lock_all_vcpus+0xab/0x180 [kvm]
but this task is not holding:
&kvm->lock
stack backtrace:
CPU: 123 UID: 0 PID: 39235 Comm: qemu Tainted: G U W O 7.0.0-smp--36ad607330fb-snp #112 PREEMPTLAZY
Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
Call Trace:
<TASK>
dump_stack_lvl+0x54/0x70
__lock_acquire+0x7b9/0x2900
reacquire_held_locks+0x107/0x160
lock_release+0x177/0x360
__mutex_unlock_slowpath+0x3c/0x2b0
sev_mem_enc_ioctl+0x3c9/0x400 [kvm_amd]
kvm_vm_ioctl+0x57c/0x5d0 [kvm]
__se_sys_ioctl+0x6d/0xb0
do_syscall_64+0xe8/0x920
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
other info that might help us debug this:
no locks held by qemu/39235.
stack backtrace:
CPU: 123 UID: 0 PID: 39235 Comm: qemu Tainted: G U W O 7.0.0-smp--36ad607330fb-snp #112 PREEMPTLAZY
Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.86.0-102 01/25/2026
Call Trace:
<TASK>
dump_stack_lvl+0x54/0x70
__lock_acquire+0x7de/0x2900
reacquire_held_locks+0x107/0x160
lock_release+0x177/0x360
__mutex_unlock_slowpath+0x3c/0x2b0
sev_mem_enc_ioctl+0x3c9/0x400 [kvm_amd]
kvm_vm_ioctl+0x57c/0x5d0 [kvm]
__se_sys_ioctl+0x6d/0xb0
do_syscall_64+0xe8/0x920
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>