Re: [PATCH v6 2/4] x86/mce: Remove __mcheck_cpu_init_early()

From: Borislav Petkov
Date: Wed Dec 28 2022 - 13:53:40 EST


On Tue, Dec 06, 2022 at 11:36:05AM -0600, Yazen Ghannam wrote:
> + mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
> + mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR);
> + mce_flags.smca = !!cpu_has(c, X86_FEATURE_SMCA);
> + mce_flags.amd_threshold = 1;
>
> for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
> if (mce_flags.smca)
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5f406d135d32..9efd6d010e2d 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1906,19 +1906,6 @@ static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
> return 0;
> }
>
> -/*
> - * Init basic CPU features needed for early decoding of MCEs.
> - */
> -static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
> -{
> - if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
> - mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
> - mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR);
> - mce_flags.smca = !!cpu_has(c, X86_FEATURE_SMCA);
> - mce_flags.amd_threshold = 1;
> - }

Yeah, looking at this, before and after the change, what we are and were
doing here is silly. Those flags are global for the whole system but we
do set them on each CPU - unnecessarily, ofc ;-\ - because we don't have
a BSP MCE init call.

That above happens on the mcheck_cpu_init() path which is per-CPU.

However, if we had to be precise and correct, this flags setup should
happen in a function called

mcheck_bsp_init()

or so which gets called at the end of identify_boot_cpu() and which does
all the *once* actions there like allocate the gen pool, run the quirks
which need to run only once on the BSP and so on.

So that we don't have to do unnecessary work on every CPU.

Tony, thoughts?

I think we should start working towards this - doesn't have to be done
immediately but I think a proper separation of what runs where - once
on the BSP or on every CPU - is needed here. Unless I'm missing an
important angle, which is entirely possible.

Hmmm.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette