Re: [PATCH] x86/mce: Cover grading of AMD machine error checks

From: Borislav Petkov
Date: Wed Mar 09 2022 - 13:37:35 EST


Definitely a step in the right direction.

Now...

On Wed, Mar 09, 2022 at 11:41:07AM -0600, Carlos Bilbao wrote:
> AMD's severity grading covers very few machine errors. In the graded cases
> there are no user-readable messages, complicating debugging of critical
> hardware errors.

That's too generic. What is the actual use case here you're spending all
this time for?

> Fix the above issues extending the current grading logic for AMD with cases
> not previously considered and their corresponding messages.
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@xxxxxxx>
> ---
> arch/x86/include/asm/mce.h | 6 +
> arch/x86/kernel/cpu/mce/severity.c | 232 +++++++++++++++++++++++++----
> 2 files changed, 205 insertions(+), 33 deletions(-)

Now, looking at the whole thing, AFAICT all you're interested in is
getting some strings out from those error types. But but, we already
have something like that. That's even mentioned in the patch:

> + * Default return values. The poll handler catches these and passes
> + * responsibility of decoding them to EDAC

So there's a big fat module mce_amd.c which does convert MCEs to
strings. So why can't that be used and extended instead of adding more
strings to more places in the kernel?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette