[PATCH V2 0/9] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

From: Tyler Baicar
Date: Wed Apr 06 2016 - 11:13:42 EST


Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records. This provides more detail on for processor error logs.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Depends on: [PATCH v9] acpi, apei, arm64: APEI initial support for aarch64.
https://lkml.org/lkml/2016/4/5/522

Depends on: [PATCH 00/30] ACPICA: 20160318 Release
https://lkml.org/lkml/2016/3/23/649

V2: Add PSCI state print for the ARMv8 error type.
Separate timestamp year into year and century using BCD format.
Rebase on top of ACPICA 20160318 release and remove header file changes
in include/acpi/actbl1.h.
Add panic OS with fatal error status block patch.
Add processing of unrecognized CPER error section patches with updates
from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
acpi: apei: panic OS with fatal error status block

Tyler Baicar (8):
acpi: apei: read ack upon ghes record consumption
ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
efi: parse ARMv8 processor error
arm64: exception: handle Synchronous External Abort
arm64: exception: handle instruction abort at current EL
acpi: apei: handle SEA notification type for ARMv8
efi: print unrecognized CPER section
ras: acpi / apei: generate trace event for unrecognized CPER section

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/system_misc.h | 13 ++
arch/arm64/kernel/entry.S | 19 +++
arch/arm64/mm/fault.c | 58 ++++++--
drivers/acpi/apei/Kconfig | 14 ++
drivers/acpi/apei/ghes.c | 188 ++++++++++++++++++++++++-
drivers/acpi/apei/hest.c | 7 +-
drivers/firmware/efi/cper.c | 264 ++++++++++++++++++++++++++++++++---
drivers/ras/ras.c | 1 +
include/acpi/ghes.h | 1 +
include/linux/cper.h | 72 ++++++++++
include/ras/ras_event.h | 45 ++++++
12 files changed, 646 insertions(+), 37 deletions(-)

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.