Re: [PATCH] x86/MCE: Prevent CPU offline for SMCA CPUs with non-core banks
From: Thomas Gleixner
Date: Sun Aug 25 2024 - 07:16:50 EST
On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
> Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
> banks. Each of these banks represents unique and separate hardware
> located within the system. Each bank is managed by a single logical CPU;
> they are not shared. Furthermore, the "CPU to MCA bank" assignment
> cannot be modified at run time.
>
> The MCE subsystem supports run time CPU hotplug. Many vendors have
> non-core MCA banks, so MCA settings are not cleared when a CPU is
> offlined for these vendors.
>
> Even though the non-core MCA banks remain enabled, MCA errors will not
> be handled (reported, cleared, etc.) on SMCA systems when the managing
> CPU is offline.
>
> Check if a CPU manages non-core MCA banks and, if so, prevent it from
> being taken offline.
Which in turn breaks hibernation and kexec...
Thanks,
tglx