Re: [PATCH v9 3/6] x86/sev: Disable CPU hotplug while SNP is active

From: Kalra, Ashish

Date: Fri Jun 26 2026 - 16:23:49 EST


Hello Prateek,

On 6/25/2026 11:01 PM, K Prateek Nayak wrote:
> Hello Ashish,
>
> On 6/26/2026 8:08 AM, Kalra, Ashish wrote:
>>> Looking at snp_prepare(), we have an early-bailout for
>>>
>>> rdmsrq(MSR_AMD64_SYSCFG, val);
>>> if (val & MSR_AMD64_SYSCFG_SNP_EN)
>>> return;
>>>
>>> Does executing SHUTDOWN command lead to the firmware clearing SNP_EN in
>>> SYSCFG on all CPUS?
>>
>> Yes, in case of X86_SNP_SHUTDOWN (available if firmware supports X86SnpShutdown feature)
>> SNP is disabled on all cores by clearing SYSCFG[SNPEn] bit.
>>
>> If X86_SNP_SHUTDOWN is set to 1, the firmware clears the SYSCFG[SNPEn] bit in each core.
>>
>> But, in case of legacy SNP shutdown, SNP_EN bit is not cleared and so SNP remains enabled.
>
> Ah! That was the bit I was missing. Thanks a ton for clarifying.
>
>>
>>>
>>> If SNP_EN remains set (and Linux can't clear it since it is
>>> "Write-1-only" bit), then a subsequent snp_prepare() will skip setting
>>> SYSCFG if it sees SNP_EN on local CPU.
>>>
>>> It can so happen that we enable hotlpug at shutdown, CPUs come online
>>> without setting SNP_EN in SYSCFG, subsequent snp_prepare() runs on a CPU
>>> where SNP_EN is still set and skips configuring it for the CPUs that
>>> don't have it set, and we'll be in a pickle still.
>>>
>>> The comment above that bailout saying "this can happen in case of kexec
>>> boot" makes me believe that SNP_EN remains set until a full system
>>> reset.
>>>
>>> The only safe way to do this is to ensure all possible CPUs are online
>>> during snp_prepare() and do snp_enable() regardless of whether local CPU
>>> has SNP_EN or not.
>>>
>>> Am I missing something?
>>>
>>
>> The piece that makes the early bailout safe is the disable this patch adds:
>> hotplug is disabled while SNP is active, so the online set can't change under an
>> active SNP. snp_prepare() already requires online == present, so at a successful
>> init every present CPU gets SNP_EN,
>
> How is this enforced? AFAICT, on_each_cpu(snp_enable) will only covers
> the online CPUs and there could be CPUs that have been offlined before
> that right
Right that on_each_cpu() only covers online CPUs -- but snp_prepare() refuses to
proceed unless online == present.

if (!cpumask_equal(cpu_online_mask, cpu_present_mask)) {
ret = -EOPNOTSUPP;
pr_warn("SNP init failed: not all CPUs online. ...");
goto unlock;
}

The check right before on_each_cpu(snp_enable) returns -EOPNOTSUPP if any present CPU
is offline, so SNP init simply fails in that case -- there is no successful init
that leaves a CPU without SNP_EN. The check and on_each_cpu(snp_enable) both run
under cpus_read_lock(), so the online set can't change between the two, at a
successful snp_prepare(), online == present and every present CPU has SNP_EN.
After that this patch disables hotplug, so the set stays == present.

(That online == present requirement is existing snp_prepare() behavior, not
something this patch adds.)

And cpu_hotplug_disable() comes right before cpus_read_lock() as it must not be called
while holding cpus_read_lock(), something like this:


rdmsrq(MSR_AMD64_SYSCFG, val);
if (val & MSR_AMD64_SYSCFG_SNP_EN)
return 0; /* bailout: re-init/kexec, SNP_EN already set */

clear_rmp();

cpu_hotplug_disable(); /* <-- here: after bailout, before cpus_read_lock */

cpus_read_lock();
if (!cpumask_equal(cpu_online_mask, cpu_present_mask)) {
ret = -EOPNOTSUPP;
...
goto unlock; /* will re-enable below */
}
on_each_cpu(mfd_reconfigure, ...);
on_each_cpu(snp_enable, ...);
...
unlock:
cpus_read_unlock();
if (ret)
cpu_hotplug_enable(); /* undo: failed before SNP_EN was set */
return ret;

>
>> and because hotplug is then disabled none
>> can leave or rejoin without it. So whenever the bailout is hit with SNP active,
>> every online CPU already has SNP_EN:
>>
>> - kexec: SNP_EN is already set on all CPUs by the previous kernel.
>
> There is a catch here: you can have offline CPUs during the previous boot
> (say you have maxcpus=8 in your cmdline), and then you kexec with a different
> kernel / cmdline that brings online a bunch more CPUs.
>
> SNP_EN will only be set for a subset of then with the legacy SNP_INIT and
> if snp_prepare() runs on those legacy CPUs, you still skip setting it for
> the ones that don't have SNP_EN set.
>
> Is that case covered somehow or is it a non-issue?
>

It's a non-issue, for two independent reasons.

First, kexec with SNP active currently requires a full SNP shutdown before the
kexec. SNP_SHUTDOWN_EX (and the IOMMU SNP shutdown it performs) fail if there
are any active SNP guests or assigned ASIDs, so a working kexec has to terminate
all SNP guests and run a full shutdown first (via systemctl kexec). On
firmware that supports X86_SNP_SHUTDOWN, that full shutdown clears SNP_EN on all
CPUs, so the kexec target boots with SNP_EN clear and runs a complete, fresh
snp_prepare() -- where online == present is enforced, so every present CPU gets
SNP_EN. There is no inherited partial-SNP_EN state.

Second, even independent of kexec, this kernel's snp_prepare() never sets SNP_EN
on a subset: on_each_cpu(snp_enable) runs only after the
cpumask_equal(cpu_online_mask, cpu_present_mask) check passes, so it's all (every
present CPU) or nothing (snp_prepare() returns -EOPNOTSUPP and SNP_EN is never
set). With maxcpus=8 on a larger system, online != present, so SNP simply does
not initialize -- it cannot leave SNP_EN set on only those 8 cores. A successful
init therefore implies every present CPU has SNP_EN, and the present mask is the
same physical hardware across kexec.

So producing a partial SNP_EN set would require a source kernel that both sets
SNP_EN partially (i.e. doesn't enforce online == present) and skips the
full shutdown before kexec -- neither of which applies here. I think it's a
non-issue in practice.

>> - re-init while SNP is still active (e.g. after a legacy SNP_SHUTDOWN that
>> leaves SNP_EN set): hotplug was disabled the whole time, so the online set is
>> unchanged and all of them still have SNP_EN.
>>
>> The only way a CPU can be online without SNP_EN is when SNP is not active --
>> i.e. after an SNP_INIT failure, where this patch re-enables hotplug. That is
>> deliberately the same as the behavior before this support existed (hotplug was
>> never disabled then), and it is benign: SNP_EN only gates RMP checks, the RMP
>> itself is initialized by SNP_INIT, so on a failed init the RMP is all-zeroes --
>> every entry is in the default HV-owned state, no page is assigned, no check ever blocks
>> and snp_initialized stays false, so no SNP guest can be created.
>> Nothing is enforced and nothing is protected.
>>
>> So I've kept snp_prepare()'s existing bailout / snp_enable() behavior unchanged;
>> what this patch adds is disabling hotplug while SNP is active, which is what
>> actually closes the window (a CPU coming online without SNP_EN while SNP is
>> live). That window -- and the SNP_EN-stays-set-on-failure situation -- already
>> exist in today's code, this patch constrains the dangerous (active) case and
>> otherwise matches current behavior.
>
> Ack! Just that one small bit up above bothers me but other than that,
> doing it in snp_prepare() should be good.
>
> This is all new to me so thanks a ton for answering my queries.
>

Thanks,
Ashish

>>
>> (On the v9 placement specifically: I'm moving the disable into snp_prepare()
>> ahead of SNP_EN in the next version; in v9 it sits after SNP_INIT, which leaves
>> the window you originally pointed out.)