[RFC 2/4] ACPI, APEI, GHES: Add hardware memory error recovery support

From: Huang Ying
Date: Mon Apr 18 2011 - 21:43:58 EST


memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.

Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
---
drivers/acpi/apei/Kconfig | 7 +++++++
drivers/acpi/apei/ghes.c | 24 +++++++++++++++++-------
2 files changed, 24 insertions(+), 7 deletions(-)

--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -31,6 +31,13 @@ config ACPI_APEI_PCIEAER
PCIe AER errors may be reported via APEI firmware first mode.
Turn on this option to enable the corresponding support.

+config ACPI_APEI_MEMORY_FAILURE
+ bool "APEI memory error recovering support"
+ depends on ACPI_APEI && MEMORY_FAILURE
+ help
+ Memory errors may be reported via APEI firmware first mode.
+ Turn on this option to enable the memory recovering support.
+
config ACPI_APEI_EINJ
tristate "APEI Error INJection (EINJ)"
depends on ACPI_APEI && DEBUG_FS
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -430,20 +430,30 @@ static void ghes_clear_estatus(struct gh

static void ghes_do_proc(const struct acpi_hest_generic_status *estatus)
{
- int sev, processed = 0;
+ int sev, sec_sev;
struct acpi_hest_generic_data *gdata;

sev = ghes_severity(estatus->error_severity);
apei_estatus_for_each_section(estatus, gdata) {
-#ifdef CONFIG_X86_MCE
+ sec_sev = ghes_severity(gdata->error_severity);
if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
CPER_SEC_PLATFORM_MEM)) {
- apei_mce_report_mem_error(
- sev == GHES_SEV_CORRECTED,
- (struct cper_sec_mem_err *)(gdata+1));
- processed = 1;
- }
+ struct cper_sec_mem_err *mem_err;
+ mem_err = (struct cper_sec_mem_err *)(gdata+1);
+#ifdef CONFIG_X86_MCE
+ apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
+ mem_err);
#endif
+#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
+ if (sev == GHES_SEV_RECOVERABLE &&
+ sec_sev == GHES_SEV_RECOVERABLE &&
+ mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
+ unsigned long pfn;
+ pfn = mem_err->physical_addr >> PAGE_SHIFT;
+ memory_failure_queue(pfn, 0, 0);
+ }
+#endif
+ }
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/