Re: [PATCH] acpi: apei: call into AER handling regardless of severity
From: Baicar, Tyler
Date: Tue Aug 29 2017 - 17:27:53 EST
On 8/29/2017 2:20 AM, Borislav Petkov wrote:
On Mon, Aug 28, 2017 at 11:11:54AM -0600, Tyler Baicar wrote:
Currently the GHES code only calls into the AER driver for
recoverable type errors. This is incorrect because errors of
other severities do not get logged by the AER driver and do not
get exposed to user space via the AER trace event. So, call
into the AER driver for PCIe errors regardless of the severity.
Signed-off-by: Tyler Baicar <tbaicar@xxxxxxxxxxxxxx>
---
drivers/acpi/apei/ghes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..5cab238 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -489,9 +489,7 @@ static void ghes_do_proc(struct ghes *ghes,
else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
- if (sev == GHES_SEV_RECOVERABLE &&
- sec_sev == GHES_SEV_RECOVERABLE &&
Did you make the effort to see which commit added those lines and read
its commit message?
Doesn't look like it...
Hello Boris,
Here is that commit text:
"ACPI, APEI, GHES: Add PCIe AER recovery support
ÂÂÂ aer_recover_queue() is called when recoverable PCIe AER errors are
ÂÂÂ notified by firmware to do the recovery work."
The function with the real bulk of the code we need here is
aer_recover_work_func() which calls into cper_print_aer() and
do_recovery(). The do_recovery() function is the only function that
should be specific to recoverable errors. We need cper_print_aer() to
handle printing of AER specific information and to trigger the aer_event
to notify user space. Otherwise tools such as RAS Daemon will not be
notified of correctable type PCIe errors. You can clearly see by looking
at cper_print_aer() that it expects to be called with correctable errors
as well. To avoid calling the do_recovery() function for correctable
errors I created https://patchwork.kernel.org/patch/9925877/
The AER core framework for non-FF systems prints all the AER error
information for all errors and then only calls do_recovery() for
non-correctable errors. See aer_process_err_devices() and
handle_error_source().
Thanks,
Tyler
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.