[RFC PATCH v2 0/2] CPU offlining with non-core MCA banks
From: Yazen Ghannam
Date: Thu Aug 29 2024 - 18:32:51 EST
Hi all,
The major change in this revision is to prevent the sysfs interface from
being created in the first place for CPUs that shouldn't be offlined.
This is a more direct solution to prevent users from bringing down CPUs.
And it shouldn't affect internal kernel hotplug flows.
Also, I've changed this set to RFC, because there are still open questions
about how to address this use case. Here are just a couple to start...
1) What if a user wants to offline a CPU, and they don't know or care
about this restriction?
Should this behavior be controlled by a kernel parameter? In this
way, a system admin can enforce this policy without affecting the
general user base.
2) Should this use case be generalized and indicated by the platform?
Maybe a new flag in the ACPI MADT Processor Local APIC Structure?
This would be set by firmware to inform the OS to not allow a logical
CPU to be taken offline. Again, this could be enforced by a system
admin by changing system BIOS/firmware settings.
Thanks,
Yazen
Link:
https://lkml.kernel.org/r/20240821140017.330105-1-yazen.ghannam@xxxxxxx/
v1->v2:
* Change to RFC.
* Include new patch to adjust the number of MCA banks.
* Change solution to prevent the creation of "cpuX/online".
Yazen Ghannam (2):
x86/mce: Set a more accurate value for mce_num_banks
x86/mce: Prevent CPU offline for SMCA CPUs with non-core banks
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/core.c | 22 +++++++++++++++++++++-
arch/x86/kernel/setup.c | 2 +-
3 files changed, 24 insertions(+), 2 deletions(-)
base-commit: 793aa4bf192d0ad07cca001a596f955d121f5c10
--
2.34.1