Re: [PATCH v9 3/6] x86/sev: Disable CPU hotplug while SNP is active
From: Kalra, Ashish
Date: Thu Jun 25 2026 - 22:39:07 EST
On 6/25/2026 5:16 PM, K Prateek Nayak wrote:
> Hello Ashish,
>
> On 6/26/2026 1:12 AM, Kalra, Ashish wrote:
>> Hello Boris,
>>
>> On 6/25/2026 10:02 AM, Borislav Petkov wrote:
>>> On Wed, Jun 24, 2026 at 09:56:49PM +0000, Ashish Kalra wrote:
>>>> +/* Set while SNP has CPU hotplug disabled (kernel-lifetime; survives ccp reload). */
>>>> +static bool snp_cpu_hotplug_disabled;
>>>
>>> Do you really need this?
>>>
>>
>> Yes.
>>
>> cpu_hotplug_disable()/cpu_hotplug_enable() are refcounted (cpu_hotplug_disabled++/--,
>> with a WARN on underflow), so they have to be balanced. This flag collapses them to
>> exactly one outstanding disable per SNP-active window, because the disable and enable
>> sites are not reached a symmetric number of times:
>>> - On firmware without SNP_X86_SHUTDOWN_SUPPORTED, __sev_snp_shutdown_locked() does not
>> call snp_shutdown() (it's gated on data.x86_snp_shutdown), so SNP stays enabled in
>> hardware — SNP_EN stays set and hotplug stays disabled — while sev->snp_initialized is
>> cleared. Re-init after that is routine, the SNP ioctls self-bracket init and shutdown
>> (e.g. SNP_COMMIT, SNP_SET_CONFIG, SNP_VLEK_LOAD):
>>
>> if (!sev->snp_initialized)
>> snp_move_to_init_state(...); /* -> __sev_snp_init_locked -> snp_prepare() */
>> ... SNP_CMD ...
>> if (shutdown_required)
>> __sev_snp_shutdown_locked(...);
>> - So whenever SNP isn't already initialized (psp_init_on_probe off, or after a prior
>> legacy shutdown), every such ioctl does init -> command -> legacy shutdown. Each init
>> reaches snp_prepare() with SNP_EN already set, and the disable now sits at the top of
>> snp_prepare(), so it fires on every cycle. Without this flag that keeps bumping
>> cpu_hotplug_disabled while the legacy shutdown never re-enables — hotplug ends up stuck
>> disabled. This flag makes all but the first disable a no-op.
>>
>> - Also, importantly, kvm-amd module reload on legacy firmware is the same pattern:
>> unload leaves SNP_EN set, reload re-inits.)
>
> Looking at snp_prepare(), we have an early-bailout for
>
> rdmsrq(MSR_AMD64_SYSCFG, val);
> if (val & MSR_AMD64_SYSCFG_SNP_EN)
> return;
>
> Does executing SHUTDOWN command lead to the firmware clearing SNP_EN in
> SYSCFG on all CPUS?
Yes, in case of X86_SNP_SHUTDOWN (available if firmware supports X86SnpShutdown feature)
SNP is disabled on all cores by clearing SYSCFG[SNPEn] bit.
If X86_SNP_SHUTDOWN is set to 1, the firmware clears the SYSCFG[SNPEn] bit in each core.
But, in case of legacy SNP shutdown, SNP_EN bit is not cleared and so SNP remains enabled.
>
> If SNP_EN remains set (and Linux can't clear it since it is
> "Write-1-only" bit), then a subsequent snp_prepare() will skip setting
> SYSCFG if it sees SNP_EN on local CPU.
>
> It can so happen that we enable hotlpug at shutdown, CPUs come online
> without setting SNP_EN in SYSCFG, subsequent snp_prepare() runs on a CPU
> where SNP_EN is still set and skips configuring it for the CPUs that
> don't have it set, and we'll be in a pickle still.
>
> The comment above that bailout saying "this can happen in case of kexec
> boot" makes me believe that SNP_EN remains set until a full system
> reset.
>
> The only safe way to do this is to ensure all possible CPUs are online
> during snp_prepare() and do snp_enable() regardless of whether local CPU
> has SNP_EN or not.
>
> Am I missing something?
>
The piece that makes the early bailout safe is the disable this patch adds:
hotplug is disabled while SNP is active, so the online set can't change under an
active SNP. snp_prepare() already requires online == present, so at a successful
init every present CPU gets SNP_EN, and because hotplug is then disabled none
can leave or rejoin without it. So whenever the bailout is hit with SNP active,
every online CPU already has SNP_EN:
- kexec: SNP_EN is already set on all CPUs by the previous kernel.
- re-init while SNP is still active (e.g. after a legacy SNP_SHUTDOWN that
leaves SNP_EN set): hotplug was disabled the whole time, so the online set is
unchanged and all of them still have SNP_EN.
The only way a CPU can be online without SNP_EN is when SNP is not active --
i.e. after an SNP_INIT failure, where this patch re-enables hotplug. That is
deliberately the same as the behavior before this support existed (hotplug was
never disabled then), and it is benign: SNP_EN only gates RMP checks, the RMP
itself is initialized by SNP_INIT, so on a failed init the RMP is all-zeroes --
every entry is in the default HV-owned state, no page is assigned, no check ever blocks
and snp_initialized stays false, so no SNP guest can be created.
Nothing is enforced and nothing is protected.
So I've kept snp_prepare()'s existing bailout / snp_enable() behavior unchanged;
what this patch adds is disabling hotplug while SNP is active, which is what
actually closes the window (a CPU coming online without SNP_EN while SNP is
live). That window -- and the SNP_EN-stays-set-on-failure situation -- already
exist in today's code, this patch constrains the dangerous (active) case and
otherwise matches current behavior.
(On the v9 placement specifically: I'm moving the disable into snp_prepare()
ahead of SNP_EN in the next version; in v9 it sits after SNP_INIT, which leaves
the window you originally pointed out.)
Thanks,
Ashish
>>
>> - On the enable side it avoids an unbalanced cpu_hotplug_enable() when the teardown/failure
>> paths run without an outstanding disable (e.g. shutdown of a never-fully-initialized SNP).
>>
>> So it's not redundant with cpu_hotplug_disabled — it tracks whether the outstanding disable
>> belongs to this SNP-active window in this kernel, which keeps the single disable/enable
>> balanced across the asymmetric legacy-vs-full SNP teardown paths and re-init.