Re: [PATCH v2] x86/mce: Do not log spurious corrected mce errors

From: Borislav Petkov
Date: Tue Feb 18 2020 - 11:13:29 EST


On Mon, Feb 17, 2020 at 08:06:59AM -0500, Prarit Bhargava wrote:
> A user has reported that they are seeing spurious corrected errors on
> their hardware.
>
> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
> with the valid field (bit 63) set, the uncorrected error field (bit 61)
> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
> an MCA Error Code (bits [15:0]) of 0x0005."
>
> Block these spurious errors from the console and logs.
>
> Links to Intel Specification updates:
> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html

My previous review comment still holds:

Those links tend to get stale with time. If you really want to refer to
the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
them there as an attachment and add the link to the entry to the commit
message.

> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> Co-developed-by: Alexander Krupp <centos@xxxxxxxxxxxx>

WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
#36:

See Documentation/process/submitting-patches.rst for more detail.

> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: linux-edac@xxxxxxxxxxxxxxx
> ---
> arch/x86/kernel/cpu/mce/core.c | 2 ++
> arch/x86/kernel/cpu/mce/intel.c | 17 +++++++++++++++++
> arch/x86/kernel/cpu/mce/internal.h | 1 +
> 3 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 2c4f949611e4..fe3983d551cc 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1877,6 +1877,8 @@ bool filter_mce(struct mce *m)
> {
> if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> return amd_filter_mce(m);
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + return intel_filter_mce(m);
>
> return false;
> }
> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index 5627b1091b85..989148e6746c 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -520,3 +520,20 @@ void mce_intel_feature_clear(struct cpuinfo_x86 *c)
> {
> intel_clear_lmce();
> }
> +
> +bool intel_filter_mce(struct mce *m)
> +{
> + struct cpuinfo_x86 *c = &boot_cpu_data;
> +
> + /* MCE errata HSD131, HSM142, HSW131, BDM48, and HSM142 */
> + if ((c->x86 == 6) &&
> + ((c->x86_model == INTEL_FAM6_HASWELL) ||
> + (c->x86_model == INTEL_FAM6_HASWELL_L) ||
> + (c->x86_model == INTEL_FAM6_BROADWELL) ||
> + (c->x86_model == INTEL_FAM6_HASWELL_G)) &&
> + (m->bank == 0) &&
> + ((m->status & 0xa0000000ffffffff) == 0x80000000000f0005))
> + return true;
> +
> + return false;
> +}
> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
> index b785c0d0b590..821faba5b05d 100644
> --- a/arch/x86/kernel/cpu/mce/internal.h
> +++ b/arch/x86/kernel/cpu/mce/internal.h
> @@ -175,5 +175,6 @@ extern bool amd_filter_mce(struct mce *m);
> #else
> static inline bool amd_filter_mce(struct mce *m) { return false; };
> #endif
> +extern bool intel_filter_mce(struct mce *m);

It doesn't even build:

ld: arch/x86/kernel/cpu/mce/core.o: in function `filter_mce':
/home/boris/kernel/linux/arch/x86/kernel/cpu/mce/core.c:1881: undefined reference to `intel_filter_mce'
make: *** [Makefile:1077: vmlinux] Error 1

Hint: do it like it is done for amd_filter_mce() but in the respective
#ifdef CONFIG_X86_MCE_INTEL place.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette