Re: MCE/Package power limit notification
From: Fenghua Yu
Date: Tue Nov 29 2011 - 16:43:43 EST
On Tue, Nov 29, 2011 at 01:24:24PM -0800, Udo Steinberg wrote:
> On Mon, 28 Nov 2011 14:50:47 -0800 Yu, Fenghua (YF) wrote:
>
> YF> I sent out a patch to remove the mcelog info. Could you try it and see if it works for you?
> YF> https://lkml.org/lkml/2011/11/14/239
> YF>
> YF> Thanks.
> YF>
> YF> -Fenghua
>
> Hi Fenghua,
>
> Thanks for the patch. It works and eliminates the MCE warnings. What exactly
> are the BIOS issues mentioned in the patch description? Is BIOS programming
> some MSRs the wrong way?
Hi, Udo,
Could you please check counters in /sys/devices/system/cpu/cpu#/thermal_throttle
and see which counters report the thermal events?
The thought of the patch is to remove the errors in mcelog and report the errors
in respective counters. Therefore, the events are not reported as scary hardware
issues but are still captured in counters.
I think BIOS/firmware sets up power limit or thermal throttle incorrectly and
triggers events incorrectly. You may try updated BIOS to see if the events go
away.
Thanks.
-Fenghua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/