Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory errortrace event
From: Borislav Petkov
Date: Thu Aug 15 2013 - 14:41:58 EST
On Thu, Aug 15, 2013 at 06:16:48PM +0000, Luck, Tony wrote:
> > * We parse some APEI table and disable those MCA banks which the BIOS
> > wants to handle first.
>
> We have no idea which errors the BIOS has chosen for itself. We just
> know which bank numbers ...
Well, those which BIOS hasn't chosen for itself get simply handled up
through, HEST it is, I think. So it all goes out in APEI anyway...
> and Intel processors change mappings of which errors are logged in
> which banks in every new processor tock (and sometimes tick). Some
> banks are documented in processor datasheet. most are not. Most common
> case might well be memory ... but it could be cache, or I/O, or ...
>
> So this doesn't help Mauro figure out whether to allow loading of an
> EDAC driver that will peek and poke at chipset specific registers in
> possibly racy ways with BIOS code doing the same thing.
That doesn't matter - the only thing that matters is if an EDAC driver
has anything additional to bring to the table. If it does, then it gets
to see the errors before they're dumped to userspace. If not, then APEI
should report them directly.
Mind you, if we've disabled an MCA bank for the kernel then no EDAC
driver gets to see errors from it either because APEI has taken
responsibility. Unless said driver is poking around MCA registers -
which it shouldn't.
So I'd guess the decision to load an EDAC driver should be a platform
one. A platform which gives *sufficient* information in APEI tables for
an error doesn't need an EDAC driver. Older platforms or platforms which
cannot supply sufficient information for, say, properly pinpointing the
DIMM, should use the additional help of an EDAC driver for that, if
possible.
Which begs the most important question: do we even have a platform that
can give us sufficient information without the need for an EDAC driver?
Because if not, we should stop wasting energy pointlessly and simply
drop this discussion: we basically load an EDAC driver and do not do the
APEI tracepoint because it simply doesn't make any sense and there's no
actual platform giving us that info.
So, which is it?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/