Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

From: Hawa, Hanna
Date: Wed Jun 12 2019 - 08:40:21 EST

Next message: tip-bot for Aubrey Li: "[tip:x86/core] x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status"
Previous message: tip-bot for Aubrey Li: "[tip:x86/core] proc: Add /proc/<pid>/arch_status"
In reply to: Borislav Petkov: "Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC"
Next in thread: Borislav Petkov: "Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Boris,

Yap, I think we're in agreement here. I believe the important question
is whether you need to get error information from multiple sources
together in order to do proper recovery or doing it per error source
suffices.

And I think the actual use cases could/should dictate our
drivers/orchestrators design.

Thus my question how you guys are planning on tying all that error info
the drivers report, into the whole system design?

We have daemon script that collects correctable/uncorrectable errors from EDAC sysfs and reports to Amazon service that allow us to take action on specific error thresholds.

Thanks,
Hanna

Next message: tip-bot for Aubrey Li: "[tip:x86/core] x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status"
Previous message: tip-bot for Aubrey Li: "[tip:x86/core] proc: Add /proc/<pid>/arch_status"
In reply to: Borislav Petkov: "Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC"
Next in thread: Borislav Petkov: "Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]