Re: [PATCH] x86/mce: Unify vendors grading logic and provide AMD machine error checks

From: Borislav Petkov
Date: Tue Mar 08 2022 - 14:33:16 EST


On Tue, Mar 08, 2022 at 12:41:34PM -0600, Carlos Bilbao wrote:
> AMD's severity grading covers very few machine errors. In the graded cases
> there are no user-readable messages, complicating debugging of critical
> hardware errors. Furthermore, with the current implementation AMD MCEs have
> no support for the severities-coverage file. Adding new severities for AMD
> with the current logic would be too convoluted.
>
> Fix the above issues including AMD severities to the severity table, in
> combination with Intel MCEs. Unify the severity grading logic of both
> vendors. Label the vendor-specific cases (e.g. cases with different
> registers) where checks cannot be implicit with the available features.
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@xxxxxxx>
> ---
> arch/x86/include/asm/mce.h | 7 ++
> arch/x86/kernel/cpu/mce/severity.c | 188 +++++++++++++++--------------
> 2 files changed, 103 insertions(+), 92 deletions(-)

Sorry, maybe you're too new to this and you probably haven't read the
old discussions we have had about the severity grading turd. In order to
save you some time: adding more to that macro insanity is not going to
happen.

The AMD severity grading functions are *actually* readable vs this
abomination which I hate with passion.

If you want to add more logic, you should add to mce_severity_amd(),
perhaps call other helper functions which grade based on a certain
aspect of the error type, split the logic, use comments, etc, but
*definitely* not this.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette