[PATCH V2] x86, mce, amd: Enable interrupts by default if HW capable

From: Aravind Gopalakrishnan
Date: Mon Feb 02 2015 - 11:36:18 EST


We setup APIC vectors for threshold errors if interrupt_capable.
However, we don't set interrupt_enable by default.
Re-working threshold_restart_bank() here so that the first time we
set up lvt_offset, we also set IntType to APIC.

User is still allowed to disable interrupts through sysfs.

While at it, check if status is valid before we proceed to log
error using mce_log. This is because, in multi-node platforms,
only NBC has valid status info. So, the decoding of status values
on the non-NBC leads to noise on kernel logs like so-

[ 440.509744] EDAC DEBUG: amd64_inject_write_store: section=0x80000000
word_bits=0x10020001
[ 466.570925] [Hardware Error]: Corrected error, no action required.
[ 466.570935] [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|-
[ 466.570936] [Hardware Error]: Corrected error, no action required.
[ 466.570959] [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|-
<...>
[ 466.571293] WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[ 466.571301] WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147
decode_bus_error+0x1ba/0x2a0()
[ 466.571303] Something is rotten in the state of Denmark.

Suggested-by: Borislav Petkov <bp@xxxxxxx>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@xxxxxxx>
---
Changes in V2:
- earlier changes regarding removal of bank == 4 check and removal
of 'interrupt_enable' attribute causes regressions. Fixed that.
- moving setting of threshold_limit and comment style fixes are not
directly related to this patch. So removing them to cut out any
distractions
- Add fix for garbled dmesg output on multi-node platforms, modify
commit message to reflect the change.

arch/x86/kernel/cpu/mcheck/mce_amd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f1c3769..82c5144 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
if (!b.interrupt_capable)
goto init;

+ b.interrupt_enable = 1;
new = (high & MASK_LVTOFF_HI) >> 20;
offset = setup_APIC_mce(offset, new);

@@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void)
log:
mce_setup(&m);
rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
+ if (!(m.status & MCI_STATUS_VAL))
+ return;
m.misc = ((u64)high << 32) | low;
m.bank = bank;
mce_log(&m);
@@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
b->interrupt_capable = lvt_interrupt_supported(bank, high);
b->threshold_limit = THRESHOLD_MAX;

- if (b->interrupt_capable)
+ if (b->interrupt_capable) {
threshold_ktype.default_attrs[2] = &interrupt_enable.attr;
- else
+ b->interrupt_enable = 1;
+ } else {
threshold_ktype.default_attrs[2] = NULL;
+ }

INIT_LIST_HEAD(&b->miscj);

--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/