[34-longterm 068/196] x86, amd: Disable GartTlbWlkErr when BIOS forgets it

From: Paul Gortmaker
Date: Mon Mar 12 2012 - 21:00:21 EST


From: Joerg Roedel <joerg.roedel@xxxxxxx>

-------------------
This is a commit scheduled for the next v2.6.34 longterm release.
If you see a problem with using this for longterm, please comment.
-------------------

commit 5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e upstream.

This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
the BIOS forgets to do is (or is just too old). Letting
these errors enabled can cause a sync-flood on the CPU
causing a reboot.

The AMD BKDG recommends disabling GART TLB Wlk Error completely.

This patch is the fix for

https://bugzilla.kernel.org/show_bug.cgi?id=33012

on my machine.

Signed-off-by: Joerg Roedel <joerg.roedel@xxxxxxx>
Link: http://lkml.kernel.org/r/20110415131152.GJ18463@xxxxxxxxxx
Tested-by: Alexandre Demers <alexandre.f.demers@xxxxxxxxx>
Signed-off-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>
Signed-off-by: Paul Gortmaker <paul.gortmaker@xxxxxxxxxxxxx>
---
arch/x86/include/asm/msr-index.h | 4 ++++
arch/x86/kernel/cpu/amd.c | 19 +++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5928fc0..ad6338e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -82,11 +82,15 @@
#define MSR_IA32_MC0_ADDR 0x00000402
#define MSR_IA32_MC0_MISC 0x00000403

+#define MSR_AMD64_MC0_MASK 0xc0010044
+
#define MSR_IA32_MCx_CTL(x) (MSR_IA32_MC0_CTL + 4*(x))
#define MSR_IA32_MCx_STATUS(x) (MSR_IA32_MC0_STATUS + 4*(x))
#define MSR_IA32_MCx_ADDR(x) (MSR_IA32_MC0_ADDR + 4*(x))
#define MSR_IA32_MCx_MISC(x) (MSR_IA32_MC0_MISC + 4*(x))

+#define MSR_AMD64_MCx_MASK(x) (MSR_AMD64_MC0_MASK + (x))
+
/* These are consecutive and not in the normal 4er MCE bank block */
#define MSR_IA32_MC0_CTL2 0x00000280
#define MSR_IA32_MCx_CTL2(x) (MSR_IA32_MC0_CTL2 + (x))
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6f834f8..dba9ca2 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -569,6 +569,25 @@ static void __cpuinit init_amd(struct cpuinfo_x86 *c)
/* As a rule processors have APIC timer running in deep C states */
if (c->x86 >= 0xf && !cpu_has_amd_erratum(amd_erratum_400))
set_cpu_cap(c, X86_FEATURE_ARAT);
+
+ /*
+ * Disable GART TLB Walk Errors on Fam10h. We do this here
+ * because this is always needed when GART is enabled, even in a
+ * kernel which has no MCE support built in.
+ */
+ if (c->x86 == 0x10) {
+ /*
+ * BIOS should disable GartTlbWlk Errors themself. If
+ * it doesn't do it here as suggested by the BKDG.
+ *
+ * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012
+ */
+ u64 mask;
+
+ rdmsrl(MSR_AMD64_MCx_MASK(4), mask);
+ mask |= (1 << 10);
+ wrmsrl(MSR_AMD64_MCx_MASK(4), mask);
+ }
}

#ifdef CONFIG_X86_32
--
1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/