[tip:x86/core] x86/mce: Add support for deferred errors on AMD

From: tip-bot for Aravind Gopalakrishnan
Date: Sun Jun 07 2015 - 13:38:21 EST


Commit-ID: 7559e13fb4abe7880dfaf985d6a1630ca90a67ce
Gitweb: http://git.kernel.org/tip/7559e13fb4abe7880dfaf985d6a1630ca90a67ce
Author: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@xxxxxxx>
AuthorDate: Wed, 6 May 2015 06:58:55 -0500
Committer: Borislav Petkov <bp@xxxxxxx>
CommitDate: Wed, 6 May 2015 20:34:31 +0200

x86/mce: Add support for deferred errors on AMD

Deferred errors indicate error conditions that were not corrected, but
those errors have not been consumed yet. They require no action from
S/W (or action is optional). These errors provide info about a latent
uncorrectable MCE that can occur when a poisoned data is consumed by the
processor.

Newer AMD processors can generate deferred errors and can be configured
to generate APIC interrupts on such events.

SUCCOR stands for S/W UnCorrectable error COntainment and Recovery.
It indicates support for data poisoning in HW and deferred error
interrupts.

Add new bitfield to mce_vendor_flags for this. We use this to verify
presence of deferred error interrupts before we enable them in mce_amd.c

While at it, clarify comments in mce_vendor_flags to provide an
indication of usages of the bitfields.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@xxxxxxx>
Cc: Tony Luck <tony.luck@xxxxxxxxx>
Cc: x86-ml <x86@xxxxxxxxxx>
Cc: linux-edac <linux-edac@xxxxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/1430913538-1415-4-git-send-email-Aravind.Gopalakrishnan@xxxxxxx
[ beef up commit message, do CPUID(8000_0007) only once. ]
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
---
arch/x86/include/asm/mce.h | 15 +++++++++++++--
arch/x86/kernel/cpu/mcheck/mce.c | 10 ++++++++--
2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1f5a86d..407ced6 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -117,8 +117,19 @@ struct mca_config {
};

struct mce_vendor_flags {
- __u64 overflow_recov : 1, /* cpuid_ebx(80000007) */
- __reserved_0 : 63;
+ /*
+ * overflow recovery cpuid bit indicates that overflow
+ * conditions are not fatal
+ */
+ __u64 overflow_recov : 1,
+
+ /*
+ * SUCCOR stands for S/W UnCorrectable error COntainment
+ * and Recovery. It indicates support for data poisoning
+ * in HW and deferred error interrupts.
+ */
+ succor : 1,
+ __reserved_0 : 62;
};
extern struct mce_vendor_flags mce_flags;

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533..521e501 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1637,10 +1637,16 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
mce_intel_feature_init(c);
mce_adjust_timer = cmci_intel_adjust_timer;
break;
- case X86_VENDOR_AMD:
+
+ case X86_VENDOR_AMD: {
+ u32 ebx = cpuid_ebx(0x80000007);
+
mce_amd_feature_init(c);
- mce_flags.overflow_recov = cpuid_ebx(0x80000007) & 0x1;
+ mce_flags.overflow_recov = !!(ebx & BIT(0));
+ mce_flags.succor = !!(ebx & BIT(1));
break;
+ }
+
default:
break;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/