Re: [PATCH v2 1/4] EDAC/mce_amd: Remove SMCA Extended Error code descriptions

From: Borislav Petkov
Date: Wed Oct 25 2023 - 15:08:36 EST


On Wed, Oct 25, 2023 at 05:14:52AM +0000, Muralidhara M K wrote:
> The SMCA error decoding already exists in rasdaemon and future bank decoding
> is supported from below patches merged in rasdaemon.
> https://github.com/mchehab/rasdaemon/commit/1f74a59ee33b7448b00d7ba13d5ecd4918b9853c rasdaemon: Add new MA_LLC, USR_DP, and USR_CP bank types
> https://github.com/mchehab/rasdaemon/commit/2d15882a0cbfce0b905039bebc811ac8311cd739 rasdaemon: Handle reassigned bit definitions for UMC bank
>

I'm still missing here the exact steps a user needs to do in order to
decode such an error.

Please inject an error, catch the error message and show me how one is
supposed to decode it with rasdaemon in case the daemon is not running
while the error happens or the error is fatal and the machine doesn't
even get to run userspace.

If that is not possible with rasdaemon yet, then this patch should not
remove the error descriptions but limit them only to the families for
which they're valid.

Bottom line is, I don't want to have the situation mcelog is in where
decoding errors with it is a total disaster.

IOW, I'd like error decoding on AMD to always work and be trivially easy
to do.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette