Re: [RFC] x86/mce: Add workaround for SKX/CLX/CPX spurious machine checks

From: Borislav Petkov
Date: Mon Feb 07 2022 - 13:57:10 EST


And while you're working in Tony's request...

On Sun, Feb 06, 2022 at 08:36:40PM -0800, Jue Wang wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5818b837fd4d..06001e3b2ff2 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -834,6 +834,57 @@ static void quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
> m->cs = regs->cs;
> }
>
> +static bool is_intel_srar(u64 mci_status)

You don't need this separate function - stick it all in quirk_skylake_repmov()

> + return (mci_status &
> + (MCI_STATUS_VAL|MCI_STATUS_OVER|MCI_STATUS_UC|MCI_STATUS_EN|
> + MCI_STATUS_ADDRV|MCI_STATUS_MISCV|MCI_STATUS_PCC|
> + MCI_STATUS_AR|MCI_STATUS_S)) ==
> + (MCI_STATUS_VAL|MCI_STATUS_UC|MCI_STATUS_EN|MCI_STATUS_ADDRV|
> + MCI_STATUS_MISCV|MCI_STATUS_AR|MCI_STATUS_S);
> +}
> +
> +/*
> + * Disable fast string copy and return from the MCE handler upon the first SRAR
> + * MCE on bank 1 due to a CPU erratum on Intel SKX/CLX/CPL CPUs.
> + * The fast string copy instructions ("rep movs*") could consume an
> + * uncorrectable memory error in the cache line _right after_ the
> + * desired region to copy and raise an MCE with RIP pointing to the
> + * instruction _after_ the "rep movs*".
> + * This mitigation addresses the issue completely with the caveat of
> + * performance degradation on the CPU affected. This is still better
> + * than the OS crashes on MCEs raised on an irrelevant process due to
> + * 'rep movs*' accesses in a kernel context (e.g., copy_page).
> + * Since a host drain / fail-over usually starts right after the first
> + * MCE is signaled, which results in VM migration or termination, the
> + * performance degradation is a transient effect.
> + *
> + * Returns true when fast string copy on cpu should be disabled.
> + */
> +static bool quirk_skylake_repmov(void)
> +{
> + /*
> + * State that represents if an SRAR MCE has already signaled on the DCU bank.
> + */
> + static DEFINE_PER_CPU(bool, srar_dcu_signaled);

What's that needed for?

If the MSR write below clears the CPUID bit, you clear the corresponding
X86_FEATURE flag. And this test becomes a X86_FEATURE flag test:

if (this_cpu_has(X86_FEATURE_FSRM))

I'd guess...

> + if (unlikely(!__this_cpu_read(srar_dcu_signaled))) {
> + u64 mc1_status = mce_rdmsrl(MSR_IA32_MCx_STATUS(1));
> +
> + if (is_intel_srar(mc1_status)) {
> + __this_cpu_write(srar_dcu_signaled, true);
> + msr_clear_bit(MSR_IA32_MISC_ENABLE,
> + MSR_IA32_MISC_ENABLE_FAST_STRING_BIT);
> + mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> + mce_wrmsrl(MSR_IA32_MCx_STATUS(1), 0);
> + pr_err("First SRAR MCE on DCU, CPU: %d, disable fast string copy.\n",

That error message can be understood probably only by a handful dozen of
people on the planet. Is it write-only or is it supposed to be consumed
by humans and if so, what would be the use case?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette