Re: MCE Bug?

From: Borislav Petkov
Date: Thu Jun 18 2015 - 06:25:36 EST


On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:

Hmm, and calling mce_log from __mcheck_cpu_init_generic() as you
suggested yesterday seems to work on this box here:

[ 1.588713] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (fam: 06, model: 2d, stepping
: 07)
[ 1.592727] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, full-w Broken BIOS d
etected, complain to your hardware vendor.
[ 1.997344] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[ 2.000146] Intel PMU driver.
[ 2.001376] ... version: 3
[ 2.002919] ... bit width: 48
[ 2.004626] ... generic registers: 4
[ 2.006137] ... value mask: 0000ffffffffffff
[ 2.008064] ... max period: 0000ffffffffffff
[ 2.010010] ... fixed-purpose events: 3
[ 2.011528] ... event mask: 000000070000000f
[ 2.017257] x86: Booting SMP configuration:
[ 2.019232] .... node #0, CPUs: #1
[ 2.033848] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.038730] mce: [Hardware Error]: Machine check events logged
[ 2.050735] #2
[ 2.050735] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.056163] mce: [Hardware Error]: Machine check events logged
[ 2.068133] #3
[ 2.068140] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.07412.324641] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.479404] #5

Stuff gets logged just fine, no splats later.

Hmmm, more staring...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/