Re: [PATCH -v3] x86, MCE: Drop the default decoding notifier

From: Borislav Petkov
Date: Thu Apr 14 2011 - 11:44:31 EST


On Thu, Apr 14, 2011 at 11:23:04AM -0400, Prarit Bhargava wrote:
> Oops ... I may have confused you because what I did was subtle. I
> really should have explicitly pointed out what I did. Sorry, my bad.
>
> From my patch (sorry for the cut-and-paste):
>
> @@ -239,7 +227,10 @@ static void print_mce(struct mce *m)
> * Print out human-readable details about the MCE error,
> * (if the CPU has an implementation for that)
> */
> - atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> + ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
> + if (ret != NOTIFY_STOP && (m->status & MCI_STATUS_UC))
> + pr_emerg(HW_ERR "Run the above through 'mcelog --ascii' "
> + "to decode.\n");
> }
>
> This, of course, only outputs during UCs.
>
> and
>
> @@ -289,6 +280,8 @@ static void mce_panic(char *msg, struct mce *final,
> char *exp)
> continue;
> if (!(m->status & MCI_STATUS_UC)) {
> print_mce(m);
> + printk_once(KERN_EMERG HW_ERR "MCE Corrected
> Error(s) "
> + "detected.");
> if (!apei_err)
> apei_err = apei_write_mce(m);
> }
>
> so we'll print "MCE Corrected Error(s)" _once_ if we go through this
> path. Since there is no data to decode with mcelog, a nice little one
> time message is probably the way to go :).

Ok, first of all, see the print_mce(m) call above? Yes, we're dumping
full CE MCE info in this case because they were unlogged and as such,
that info can be decoded.

But this whole point is moot since those errors can be only 32 max _and_
on the _panic_ path. And I don't think this path matters because it is
_very_ seldom. I bet you don't hit it on any of your machines.

And we don't want to fix that - we want to fix the case with the
occasional CE MCEs which get detected in the polling path but none of
their MCA regs get dumped for decoding so the decoding hint there is
out of place. And we fixed that at least partially so that it doesn't
flood the logs. If you're not fine with the default ratelimit of 10 msgs
per 5 seconds we can always raise the ratelimit but tweaking an almost
hypothetical case is just not worth it.

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/