RE: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor specific HW errors

From: Shiju Jose
Date: Mon Mar 30 2020 - 07:55:40 EST


Hi Boris,

>-----Original Message-----
>From: Borislav Petkov [mailto:bp@xxxxxxxxx]
>Sent: 30 March 2020 11:34
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: linux-acpi@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; rjw@xxxxxxxxxxxxx; helgaas@xxxxxxxxxx;
>lenb@xxxxxxxxxx; james.morse@xxxxxxx; tony.luck@xxxxxxxxx;
>gregkh@xxxxxxxxxxxxxxxxxxx; zhangliguang@xxxxxxxxxxxxxxxxx;
>tglx@xxxxxxxxxxxxx; Linuxarm <linuxarm@xxxxxxxxxx>; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; tanxiaofei <tanxiaofei@xxxxxxxxxx>;
>yangyicong <yangyicong@xxxxxxxxxx>
>Subject: Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor
>specific HW errors
>
>On Mon, Mar 30, 2020 at 10:14:20AM +0000, Shiju Jose wrote:
>> This field added based on the input from James Morse on v4 patch to
>> enable the user space application(rasdaemon) do the decoding and
>> logging of the any extra error information shared by the corresponding
>> kernel driver to the user space.
>
>How is your error reporting supposed to work?
>
>Your driver is printing error information in dmesg and, at the same time, you
>want to report errors with the rasdaemon.
>
>Currently, the kernel does not report any error info if there's a user agent like
>rasdaemon registered so you need to think about what exactly you're trying
>to achieve here wrt to error handling. Port resetting, printing error info, etc.
>Always ask yourself, what can the user do with the information you're
>printing. And so on...
The error_handled field added on the generic basis for the non-standard errors.
rasdaemon supports adding decoding of the vendor-specific error data, printing and
storing the decoded vendor error information to the sql database.
The idea was the error handled field will help the decoding part of the rasdaemon to do the
appropriate steps for logging the vendor error information depending on whether a corresponding kernel driver
has handled the error or not.
However I think the same can be achieved by adding an error handling status field to the vendor-specific data, which
the kernel driver will set after handling the error and corresponding vendor-specific code in the rasdaemon will use it
while logging the vendor error data.
>
>> Can you please confirm you want all the existing standard
>> errors(memory, ARM, PCIE) in the ghes_do_proc () to be reported
>> through the blocking notifier?
>
>Yes, I would very much prefer to have a generic solution instead of vendor-
>specific stuff left and right.
Sure.

>
>Thx.
>
>--
>Regards/Gruss,
> Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju