Re: [PATCH] x86/mce: Enable HSD131, HSM142, HSW131, BDM48, and HSM142

From: Borislav Petkov
Date: Thu Feb 06 2020 - 06:10:09 EST


On Wed, Feb 05, 2020 at 07:58:31AM -0500, Prarit Bhargava wrote:

> Subject: Re: [PATCH] x86/mce: Enable HSD131, HSM142, HSW131, BDM48, and HSM142

That subject is unreadable for humans.

> Intel Errata HSD131, HSM142, HSW131, and BDM48 report that
> "spurious corrected errors may be logged in the IA32_MC0_STATUS register
> with the valid field (bit 63) set, the uncorrected error field (bit 61)
> not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and
> an MCA Error Code (bits [15:0]) of 0x0005."
>
> Block these spurious errors from the console and logs.

Are they being hit in the wild or why do we need this?

> Links to Intel Specification updates:
> HSD131: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-desktop-specification-update.html
> HSM142: https://www.intel.com/content/www/us/en/products/docs/processors/core/4th-gen-core-family-mobile-specification-update.html
> HSW131: https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-spec-update.html
> BDM48: https://www.intel.com/content/www/us/en/products/docs/processors/core/5th-gen-core-family-spec-update.html

Those links tend to get stale with time. If you really want to refer to
the PDFs, add a new bugzilla entry on https://bugzilla.kernel.org/, add
them there as an attachment and add the link to the entry to the commit
message.

> Signed-off-by: Alexander Krupp <centos@xxxxxxxxxxxx>

What's that Signed-off-by: tag supposed to mean?

> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: linux-edac@xxxxxxxxxxxxxxx
> ---
> arch/x86/kernel/cpu/mce/core.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)

If at all, this should be done by adding an intel_filter_mce() function
and called from filter_mce() so that such errors don't get logged.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette