Re: [PATCH] x86/mce: Cover grading of AMD machine error checks

From: Borislav Petkov
Date: Thu Mar 10 2022 - 17:13:50 EST


On Thu, Mar 10, 2022 at 12:24:08PM -0600, Carlos Bilbao wrote:
> We will cover grading of MCEs like deferred memory scrub errors, attempts
> to access poisonous data, etc. I could list all new covered cases in the
> commit message if you think that'd be positive.

So no actual use case - you want to grade error severity for all types
of MCEs.

> Hope that helps clarify,

Yes, it does a bit.

It sounds to me like you want to do at least two patches:

1. Extend the severity grading function with the new types of errors

2. Add string descriptions of the error types mce_severity_amd() looks
at, so that mce_panic() issues them.

I.e., you want to decode the fatal MCEs which panic the machine.

In general, what would help is if you think about what you're trying to
achieve and write it down first. How to achieve that we can figure out
later.

What happens now is you send me a patch and I'm trying to decipher from
the code why you're doing what you're doing. Which is kinda backwards if
you think about it...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette