RE: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate an erroneous CPU core

From: Shiju Jose
Date: Fri Oct 02 2020 - 08:23:17 EST


Hi Boris, Hi James,

>-----Original Message-----
>From: Borislav Petkov [mailto:bp@xxxxxxxxx]
>Sent: 01 October 2020 18:31
>To: James Morse <james.morse@xxxxxxx>
>Cc: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx; linux-
>acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; tony.luck@xxxxxxxxx;
>rjw@xxxxxxxxxxxxx; lenb@xxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>
>Subject: Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate
>an erroneous CPU core
>
>On Thu, Oct 01, 2020 at 06:16:03PM +0100, James Morse wrote:
>> If the corrected-count is available somewhere, can't this policy be
>> made in user-space?
>
>You mean rasdaemon goes and offlines CPUs when certain thresholds are
>reached? Sure. It would be much more flexible too.

I will send the kernel changes for existing CEC to support the CPU CE errors.
Can you please have a look?

Thanks,
Shiju

>
>--
>Regards/Gruss,
> Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette