[RFC] x86/boot: Fix early boot SEV-SNP panic in direct kernel boot

From: Changyuan Lyu

Date: Thu Feb 26 2026 - 01:08:46 EST

Hi all,

I'm writing to report a regression introduced by commit 68a501d7fd82
("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") and
to request feedback on the best approach to fix it.

== The Bug ==

Commit 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM
presence check") introduced a regression that causes SEV-SNP guests
to panic during early boot under specific booting conditions.

By design, snp_vmpl should only be assigned a non-zero value when a
Secure VM Service Module (SVSM) is enabled and the guest is running
at a VMPL other than 0. The commit refactored the VMPL0 enforcement
check in sev_enable() to rely exclusively on this variable:

if (snp_vmpl && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);

This panic specifically manifests when there is *no* SVSM present,
and the kernel is booted via a direct kernel boot rather than the EFI
stub. When booting via the EFI stub
(drivers/firmware/efi/libstub/x86-stub.c), the environment and zeroed
memory are set up by the EFI loader before calling sev_enable().

However, lightweight firmwares—such as Project Oak's stage0
(https://github.com/project-oak/oak/tree/main/stage0_bin)—jump straight
to the kernel's 64-bit entry point following the 64-bit Linux boot
protocol, bypassing the EFI stub entirely. During this direct boot
path, head_64.S calls sev_enable() exceptionally early in the compressed
kernel boot sequence, significantly before the .bss section is cleared
by the rep stosq routine in .Lrelocated.

Because snp_vmpl is declared as an uninitialized global (u8 snp_vmpl;),
it is placed in the .bss section. When sev_enable() reads it during a
direct boot, the memory contains uninitialized garbage data. If this
garbage data happens to be non-zero, the kernel erroneously assumes it
is running at a non-zero VMPL. Because there is no SVSM present, the
guest forcefully terminates itself.

== Reproduction ==

The issue was reproduced and tested on an AMD EPYC 7B13 64-Core Processor.
The stage0_sev.bin firmware used for testing can be built from
https://github.com/project-oak/oak/ via:

$ bazel build //stage0_bin:stage0_bin

1. Reproducing with QEMU:

$ ./qemu-system-x86_64 -nodefaults -nographic -vga none \
-M q35,confidential-guest-support=cgs \
-accel kvm,kernel-irqchip=split \
-bios stage0_sev.bin \
-append "console=ttyS0" \
-initrd initramfs.linux_amd64.cpio \
-kernel ./vmlinuz-x86 \
-m size=1024m \
-smp 2 \
-serial stdio \
-cpu host,x2apic \
-object sev-snp-guest,id=cgs,cbitpos=51,reduced-phys-bits=1

QEMU panic log:

stage0 INFO: jumping to kernel at 0x0000000002000200
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00a00f11
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
...
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<e9> 6d f0 00 00 00 00 00 00 00 00 00 00 00 00 00 ?? ?? ?? ?? ?? ?? ??

2. Reproducing with Alioth (https://github.com/google/alioth):

$ ./alioth --log-to-file -l trace boot \
--cmdline "console=ttyS0" \
--kernel ./vmlinuz-x86 \
--cpu count=2 \
--initramfs initramfs.linux_amd64.cpio \
--memory size=1g,backend=memfd \
--coco snp,policy=0x30000 \
--firmware stage0_sev.bin

Alioth panic log:

stage0 INFO: jumping to kernel at 0x0000000002000200
Error: VM did not shutdown peacefully
0: Failed to handle VM exit: KvmRunExitSystemEvent {
type_: KvmSystemEvent(0x6),
flags: 0x31100,
}, at alioth/src/hv/kvm/vcpu/vmexit.rs:84:14
1: Failed to run VCPU-0, at /alioth/src/board/board.rs:381:46
2: VCPU-0 error, at alioth/src/vm/vm.rs:275:25
3: VM did not shutdown peacefully, at alioth-cli/src/boot/boot.rs:474:15

== A simple fix ==

I tried moving snp_vmpl to .data:

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c8c1464b3a56e..a1a1cb47e7b93 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -35,7 +35,7 @@ struct ghcb *boot_ghcb;

#define __BOOT_COMPRESSED

-u8 snp_vmpl;
+u8 snp_vmpl __section(".data") = 0;
u16 ghcb_version;

u64 boot_svsm_caa_pa;
--

I tested locally with this approach, and both VMMs can boot the kernel
successfully.

However, Gemini identified that this approach breaks SVSM guests due to
how the decompressor handles the .bss section during relocation. Below
is its analyses.

== The Complication (.bss wiping) ==

The seemingly obvious fix is to move `snp_vmpl` to `.data` (e.g.,
`u8 snp_vmpl __section(".data") = 0;`). However, doing this alone breaks
SVSM guests due to how the decompressor handles the `.bss` section
during relocation.

At .Lrelocated in arch/x86/boot/compressed/head_64.S, .bss is wiped to 0.
Currently, for SVSM guests, both snp_vmpl and boot_svsm_caa_pa (which are
populated in sev_enable()) are wiped to 0. The kernel accidentally survives
this wipe because extract_kernel() later calls early_is_sevsnp_guest(),
which contains a fallback:

if (!snp_vmpl) {
/* ... CPUID checks ... */
raw_rdmsr(MSR_SVSM_CAA, &m);
boot_svsm_caa_pa = m.q;
snp_vmpl = U8_MAX;
}

Because snp_vmpl was wiped to 0, this fallback triggers and successfully
recovers the physical address of the SVSM Calling Area into boot_svsm_caa_pa.

If we move *only* snp_vmpl to .data, it survives the wipe (e.g., snp_vmpl = 1).
But boot_svsm_caa_pa is still in .bss and gets wiped to 0. The fallback in
early_is_sevsnp_guest() is skipped (since snp_vmpl != 0), leaving
boot_svsm_caa_pa == 0. Shortly after, when extract_kernel() attempts to accept
memory, the guest crashes when it tries to use physical address 0 for the SVSM
CAA.

== Proposed Solutions ==

I analyzed the AI's analyses above to the best of my ability, and I think
it is correct. But I do not have an SVSM environment to test it out.

To safely resolve this, we have two options. I'd like to ask the maintainers
which approach is preferred:

Option 1: Revert commit 68a501d7fd82

By reverting the commit and bringing back the RMPADJUST check, we avoid
reading uninitialized .bss memory to determine the VMPL level. This sidesteps
the .bss initialization order issue entirely.

Option 2: Move early SEV variables to .data (Proposed by Gemini)

We can explicitly move snp_vmpl, boot_svsm_caa_pa, and ghcb_version to .data.

u8 snp_vmpl __section(".data") = 0;
u16 ghcb_version __section(".data") = 0;
u64 boot_svsm_caa_pa __section(".data") = 0;

This protects them from the garbage-read during direct boot, and properly
preserves their initialized SVSM states across the .bss wipe at .Lrelocated,
intentionally bypassing the need for the accidental MSR fallback recovery.

Does anyone have a preference between reverting the original commit
versus moving the affected global variables to .data?
Or please let me know if AI's alert is a false positive.
I am happy to submit a formal patch for whichever route is preferred.

Thanks,
Changyuan Lyu