[PATCH 0/7] EDAC, mce_amd: Issue decoded MCE through the tracepoint

From: Borislav Petkov
Date: Fri Aug 25 2017 - 06:26:06 EST


From: Borislav Petkov <bp@xxxxxxx>

Hi all,

here's v2 incorporating all the feedback from last time. The main
difference is that instead of adding yet another tracepoint, I extended
mce_record with the decoded string. This way is much more natural and we
should've done it like this since the get-go.

The TP record looks like this:

# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/1:1-91 [001] .... 97.021806: mce_record: CPU: 0, MCGc/s: 0/0, MC5: 9600410000000e0f, IPID: 0000000000000000, ADDR/MISC/SYND: 0000000097370d7b/0000000000000000/0000000000000000, RIP: 00:<0000000000000000>, TSC: 5c226747ec, PROCESSOR: 2:0, TIME: 0, SOCKET: 0, APIC: 0 MC5 Error: CPU Watchdog timer expire.

and userspace can pick apart the fields, as before.

Next step is adding that to rasdaemon.

Thanks.

Changelog:
==========

v1:

here's a first stab at adding a tracepoint which dumps the decoded MCE
string to userspace. The main idea is to have the decoding functionality
in the kernel and depending on whether you have userspace consumers
listening or not, to dump the error to the tracepoint or to dmesg.

In either case, we do the decoding in the kernel and don't need special
userspace. Furthermore, adding new CPU support will have to be done only
in one place.

First 6 patches are cleanups which are good to have regardless, IMO.

Any constructive comments and suggestions are appreciated.

Thanks.

P.S., Thanks to Rostedt for the input!

Borislav Petkov (7):
x86/mce: Handle an in-kernel MCE decoder
x86/mce: Extend the MCE tracepoint with a decoded string
seq_buf: Add seq_buf_clear_buf()
seq_buf: Export seq_buf_printf() to modules
EDAC, mce_amd: Convert to seq_buf
EDAC, mce_amd: Issue the decoded info through the TP or printk()
x86/mce: Issue the mcelog --ascii message on !AMD

arch/x86/include/asm/mce.h | 4 +-
arch/x86/kernel/cpu/mcheck/mce.c | 14 +-
drivers/edac/mce_amd.c | 279 ++++++++++++++++++++++++---------------
include/linux/seq_buf.h | 7 +
include/trace/events/mce.h | 11 +-
lib/seq_buf.c | 1 +
6 files changed, 204 insertions(+), 112 deletions(-)

--
2.13.0