Re: [RFC] x86/boot: Fix early boot SEV-SNP panic in direct kernel boot

From: Tom Lendacky

Date: Thu Feb 26 2026 - 09:18:07 EST

On 2/26/26 00:07, Changyuan Lyu wrote:
> Hi all,
>
> I'm writing to report a regression introduced by commit 68a501d7fd82
> ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") and
> to request feedback on the best approach to fix it.

I submitted this back on Feb 4th... but it was close to the merge
window so it wasn't picked up at that time. Please review and comment.

https://lore.kernel.org/lkml/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@xxxxxxx/

Thanks,
Tom
>
> == The Bug ==
>
> Commit 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM
> presence check") introduced a regression that causes SEV-SNP guests
> to panic during early boot under specific booting conditions.
>
> By design, snp_vmpl should only be assigned a non-zero value when a
> Secure VM Service Module (SVSM) is enabled and the guest is running
> at a VMPL other than 0. The commit refactored the VMPL0 enforcement
> check in sev_enable() to rely exclusively on this variable:
>
> if (snp_vmpl && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
>
> This panic specifically manifests when there is *no* SVSM present,
> and the kernel is booted via a direct kernel boot rather than the EFI
> stub. When booting via the EFI stub
> (drivers/firmware/efi/libstub/x86-stub.c), the environment and zeroed
> memory are set up by the EFI loader before calling sev_enable().
>
> However, lightweight firmwares—such as Project Oak's stage0
> (https://github.com/project-oak/oak/tree/main/stage0_bin)—jump straight
> to the kernel's 64-bit entry point following the 64-bit Linux boot
> protocol, bypassing the EFI stub entirely. During this direct boot
> path, head_64.S calls sev_enable() exceptionally early in the compressed
> kernel boot sequence, significantly before the .bss section is cleared
> by the rep stosq routine in .Lrelocated.
>
> Because snp_vmpl is declared as an uninitialized global (u8 snp_vmpl;),
> it is placed in the .bss section. When sev_enable() reads it during a
> direct boot, the memory contains uninitialized garbage data. If this
> garbage data happens to be non-zero, the kernel erroneously assumes it
> is running at a non-zero VMPL. Because there is no SVSM present, the
> guest forcefully terminates itself.
>
> == Reproduction ==
>
> The issue was reproduced and tested on an AMD EPYC 7B13 64-Core Processor.
> The stage0_sev.bin firmware used for testing can be built from
> https://github.com/project-oak/oak/ via:
>
> $ bazel build //stage0_bin:stage0_bin
>
> 1. Reproducing with QEMU:
>
> $ ./qemu-system-x86_64 -nodefaults -nographic -vga none \
> -M q35,confidential-guest-support=cgs \
> -accel kvm,kernel-irqchip=split \
> -bios stage0_sev.bin \
> -append "console=ttyS0" \
> -initrd initramfs.linux_amd64.cpio \
> -kernel ./vmlinuz-x86 \
> -m size=1024m \
> -smp 2 \
> -serial stdio \
> -cpu host,x2apic \
> -object sev-snp-guest,id=cgs,cbitpos=51,reduced-phys-bits=1
>
> QEMU panic log:
>
> stage0 INFO: jumping to kernel at 0x0000000002000200
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00a00f11
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ...
> Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> <e9> 6d f0 00 00 00 00 00 00 00 00 00 00 00 00 00 ?? ?? ?? ?? ?? ?? ??
>
> 2. Reproducing with Alioth (https://github.com/google/alioth):
>
> $ ./alioth --log-to-file -l trace boot \
> --cmdline "console=ttyS0" \
> --kernel ./vmlinuz-x86 \
> --cpu count=2 \
> --initramfs initramfs.linux_amd64.cpio \
> --memory size=1g,backend=memfd \
> --coco snp,policy=0x30000 \
> --firmware stage0_sev.bin
>
> Alioth panic log:
>
> stage0 INFO: jumping to kernel at 0x0000000002000200
> Error: VM did not shutdown peacefully
> 0: Failed to handle VM exit: KvmRunExitSystemEvent {
> type_: KvmSystemEvent(0x6),
> flags: 0x31100,
> }, at alioth/src/hv/kvm/vcpu/vmexit.rs:84:14
> 1: Failed to run VCPU-0, at /alioth/src/board/board.rs:381:46
> 2: VCPU-0 error, at alioth/src/vm/vm.rs:275:25
> 3: VM did not shutdown peacefully, at alioth-cli/src/boot/boot.rs:474:15
>
> == A simple fix ==
>
> I tried moving snp_vmpl to .data:
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index c8c1464b3a56e..a1a1cb47e7b93 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -35,7 +35,7 @@ struct ghcb *boot_ghcb;
>
> #define __BOOT_COMPRESSED
>
> -u8 snp_vmpl;
> +u8 snp_vmpl __section(".data") = 0;
> u16 ghcb_version;
>
> u64 boot_svsm_caa_pa;
> --
>
> I tested locally with this approach, and both VMMs can boot the kernel
> successfully.
>
> However, Gemini identified that this approach breaks SVSM guests due to
> how the decompressor handles the .bss section during relocation. Below
> is its analyses.
>
> == The Complication (.bss wiping) ==
>
> The seemingly obvious fix is to move `snp_vmpl` to `.data` (e.g.,
> `u8 snp_vmpl __section(".data") = 0;`). However, doing this alone breaks
> SVSM guests due to how the decompressor handles the `.bss` section
> during relocation.
>
> At .Lrelocated in arch/x86/boot/compressed/head_64.S, .bss is wiped to 0.
> Currently, for SVSM guests, both snp_vmpl and boot_svsm_caa_pa (which are
> populated in sev_enable()) are wiped to 0. The kernel accidentally survives
> this wipe because extract_kernel() later calls early_is_sevsnp_guest(),
> which contains a fallback:
>
> if (!snp_vmpl) {
> /* ... CPUID checks ... */
> raw_rdmsr(MSR_SVSM_CAA, &m);
> boot_svsm_caa_pa = m.q;
> snp_vmpl = U8_MAX;
> }
>
> Because snp_vmpl was wiped to 0, this fallback triggers and successfully
> recovers the physical address of the SVSM Calling Area into boot_svsm_caa_pa.
>
> If we move *only* snp_vmpl to .data, it survives the wipe (e.g., snp_vmpl = 1).
> But boot_svsm_caa_pa is still in .bss and gets wiped to 0. The fallback in
> early_is_sevsnp_guest() is skipped (since snp_vmpl != 0), leaving
> boot_svsm_caa_pa == 0. Shortly after, when extract_kernel() attempts to accept
> memory, the guest crashes when it tries to use physical address 0 for the SVSM
> CAA.
>
> == Proposed Solutions ==
>
> I analyzed the AI's analyses above to the best of my ability, and I think
> it is correct. But I do not have an SVSM environment to test it out.
>
> To safely resolve this, we have two options. I'd like to ask the maintainers
> which approach is preferred:
>
> Option 1: Revert commit 68a501d7fd82
>
> By reverting the commit and bringing back the RMPADJUST check, we avoid
> reading uninitialized .bss memory to determine the VMPL level. This sidesteps
> the .bss initialization order issue entirely.
>
> Option 2: Move early SEV variables to .data (Proposed by Gemini)
>
> We can explicitly move snp_vmpl, boot_svsm_caa_pa, and ghcb_version to .data.
>
> u8 snp_vmpl __section(".data") = 0;
> u16 ghcb_version __section(".data") = 0;
> u64 boot_svsm_caa_pa __section(".data") = 0;
>
> This protects them from the garbage-read during direct boot, and properly
> preserves their initialized SVSM states across the .bss wipe at .Lrelocated,
> intentionally bypassing the need for the accidental MSR fallback recovery.
>
> Does anyone have a preference between reverting the original commit
> versus moving the affected global variables to .data?
> Or please let me know if AI's alert is a false positive.
> I am happy to submit a formal patch for whichever route is preferred.
>
> Thanks,
> Changyuan Lyu