On Thu, Sep 03, 2020 at 06:45:30PM -0500, Smita Koralahalli wrote:
Linux Kernel uses ACPI Boot Error Record Table (BERT) to report fatalWhat's that Reported-by for?
errors that occurred in a previous boot. The MCA errors in the BERT are
reported using the x86 Processor Error Common Platform Error Record (CPER)
format. Currently, the record prints out the raw MSR values and AMD relies
on the raw record to provide MCA information.
Extract the raw MSR values of MCA registers from the BERT and feed it into
the standard mce_log() function through the existing x86/MCA RAS
infrastructure. This will result in better decoding from the EDAC MCE
decoder or the default notifier.
The implementation is SMCA specific as the raw MCA register values are
given in the register offset order of the MCAX address space.
Reported-by: kernel test robot <lkp@xxxxxxxxx>
Pls put in [] brackets over it what the 0day robot has reported.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@xxxxxxx>...
---
v3:
Moved arch specific declarations from generic header file to arch
specific header file.
Cleaned additional declarations which are unnecessary.
Included the check for context type.
Added a check to verify for the first MSR address in the register
layout.
v2:
Fixed build error reported by kernel test robot.
Passed struct variable as function argument instead of entire struct
---
arch/x86/include/asm/acpi.h | 11 +++++++++
arch/x86/include/asm/mce.h | 3 +++
arch/x86/kernel/acpi/apei.c | 9 +++++++
arch/x86/kernel/cpu/mce/apei.c | 42 +++++++++++++++++++++++++++++++++
drivers/firmware/efi/cper-x86.c | 10 +++++---
5 files changed, 72 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.cAdd a stub for apei_mce_report_x86_error() in
index c22fb55abcfd..13d60a91eaa0 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -43,3 +43,12 @@ void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
apei_mce_report_mem_error(sev, mem_err);
#endif
}
+
+int arch_apei_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
+{
+ int err = -EINVAL;
+#ifdef CONFIG_X86_MCE
+ err = apei_mce_report_x86_error(ctx_info, lapic_id);
+#endif
+ return err;
+}
arch/x86/include/asm/mce.h, in one of the !CONFIG_X86_MCE ifdeffery
which returns -EINVAL and get rid of this ifdeffery and simply do:
return apei_mce_report_x86_error(ctx_info, lapic_id);
here.
If you wanna fix the above apei_mce_report_mem_error() too, you can do
that in a separate patch.
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.cWhat does that mask mean? Either here or where it is used needs a
index af8d37962586..65001d342302 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -26,6 +26,8 @@
#include "internal.h"
+#define MASK_MCA_STATUS 0xC0002001
comment.
void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)If this function you're adding is SMCA-specific, then its name cannot be
{
struct mce m;
@@ -51,6 +53,46 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
}
EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
+int apei_mce_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
+{
+ const u64 *i_mce = ((const void *) (ctx_info + 1));
+ unsigned int cpu;
+ struct mce m;
+
+ if (!boot_cpu_has(X86_FEATURE_SMCA))
as generic as it is now.
+ return -EINVAL;I don't like that but I don't think we have a reverse mapping from LAPIC
+
+ if ((ctx_info->msr_addr & MASK_MCA_STATUS) != MASK_MCA_STATUS)
+ return -EINVAL;
+
+ mce_setup(&m);
+
+ m.extcpu = -1;
+ m.socketid = -1;
+
+ for_each_possible_cpu(cpu) {
+ if (cpu_data(cpu).initial_apicid == lapic_id) {
ID to logical CPU numbers in the kernel...
+ m.extcpu = cpu;m.socketid = cpu_data(cpu).phys_proc_id;
+ m.socketid = cpu_data(m.extcpu).phys_proc_id;
+ break;Is that structure after cper_ia_proc_ctx defined somewhere in the UEFI
+ }
+ }
+
+ m.apicid = lapic_id;
+ m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
+ m.status = *i_mce;
+ m.addr = *(i_mce + 1);
+ m.misc = *(i_mce + 2);
+ /* Skipping MCA_CONFIG */
+ m.ipid = *(i_mce + 4);
+ m.synd = *(i_mce + 5);
spec so that you can cast to it directly instead of doing this ugly
pointer arithmetic?
+Why is this function exported?
+ mce_log(&m);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(apei_mce_report_x86_error);
If "no reason", you can fix the above one too, with a separate commit.
Thx.