Re: [PATCH] x86/mce: Unify vendors grading logic and provide AMD machine error checks

From: Carlos Bilbao
Date: Wed Mar 09 2022 - 11:06:49 EST


On 3/8/2022 1:32 PM, Borislav Petkov wrote:
> On Tue, Mar 08, 2022 at 12:41:34PM -0600, Carlos Bilbao wrote:
>> AMD's severity grading covers very few machine errors. In the graded cases
>> there are no user-readable messages, complicating debugging of critical
>> hardware errors. Furthermore, with the current implementation AMD MCEs have
>> no support for the severities-coverage file. Adding new severities for AMD
>> with the current logic would be too convoluted.
>>
>> Fix the above issues including AMD severities to the severity table, in
>> combination with Intel MCEs. Unify the severity grading logic of both
>> vendors. Label the vendor-specific cases (e.g. cases with different
>> registers) where checks cannot be implicit with the available features.
>>
>> Signed-off-by: Carlos Bilbao <carlos.bilbao@xxxxxxx>
>> ---
>> arch/x86/include/asm/mce.h | 7 ++
>> arch/x86/kernel/cpu/mce/severity.c | 188 +++++++++++++++--------------
>> 2 files changed, 103 insertions(+), 92 deletions(-)
>
> Sorry, maybe you're too new to this and you probably haven't read the
> old discussions we have had about the severity grading turd. In order to
> save you some time: adding more to that macro insanity is not going to
> happen.
>
> The AMD severity grading functions are *actually* readable vs this
> abomination which I hate with passion.
>
> If you want to add more logic, you should add to mce_severity_amd(),
> perhaps call other helper functions which grade based on a certain
> aspect of the error type, split the logic, use comments, etc, but
> *definitely* not this.
>
> Thx.
>

Understood, sending a new patch in that direction.

Thanks,
Carlos