[PATCH v4 0/4] x86/crash: Fix double NMI shootdown bug
From: Sean Christopherson
Date: Wed Nov 30 2022 - 18:37:08 EST
Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
the hard work. NMI shootdown is a one-time thing; the handler leaves NMIs
blocked and enters halt. At best, a second (or third...) shootdown is an
expensive nop, at worst it can hang the kernel and prevent kexec'ing into
a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
double shootdown resulted in a double list_add(), which is fatal when running
with CONFIG_BUG_ON_DATA_CORRUPTION=y.
With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
be reached after kdump_nmi_shootdown_cpus() (currently the only two users
of nmi_shootdown_cpus()).
To fix, move the disabling of virtualization into crash_nmi_callback(),
remove emergency_vmx_disable_all()'s callback, and do a shootdown for
emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
(obviously), and since I can't envision a use case for an NMI shootdown that
doesn't want to disable virtualization, doing that in the core handler means
emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
doesn't care when that shootdown happened or what callback may have run.
Patches 2-4 bring SVM on par with VMX.
v4:
- Set GIF=1 prior to disabling SVM. [Andrew, Tom]
- Name new helper cpu_emergency_disable_virtualization() instead of
cpu_crash_disable_virtualization().
- Add patch to handle SVM in the stop/reboot case.
- Drop patch to fold __cpu_emergency_vmxoff() into cpu_emergency_vmxoff(),
will be sent as part of a larger cleanup.
v3:
- https://lore.kernel.org/all/20221114233441.3895891-1-seanjc@xxxxxxxxxx
- Re-collect Guilherme's Tested-by.
- Tweak comment in patch 1 to reference STGI instead of CLGI.
- Celebrate this series' half-birthday.
v2:
- Use a NULL handler and crash_ipi_issued instead of a magic nop
handler. [tglx]
- Add comments to call out that modifying the existing handler
once the NMI is sent may cause explosions.
- Add a patch to cleanup cpu_emergency_vmxoff().
- https://lore.kernel.org/all/20220518001647.1291448-1-seanjc@xxxxxxxxxx
v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@xxxxxxxxxx
Sean Christopherson (4):
x86/crash: Disable virt in core NMI crash handler to avoid double
shootdown
x86/virt: Force GIF=1 prior to disabling SVM (for reboot flows)
x86/reboot: Disable virtualization in an emergency if SVM is supported
x86/reboot: Disable SVM, not just VMX, when stopping CPUs
arch/x86/include/asm/reboot.h | 2 +
arch/x86/include/asm/virtext.h | 13 ++++-
arch/x86/kernel/crash.c | 17 +------
arch/x86/kernel/reboot.c | 88 ++++++++++++++++++++++++----------
arch/x86/kernel/smp.c | 6 +--
5 files changed, 82 insertions(+), 44 deletions(-)
base-commit: d800169041c0e035160c8b81f30d4b7e8f8ef777
--
2.38.1.584.g0f3c55d4c2-goog