RE: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac

From: Luck, Tony
Date: Wed Jul 19 2017 - 11:14:40 EST


> "The module number of the memory error location. (NODE, CARD, and MODULE
> should provide the information necessary to identify the failing FRU)."
>
> So this tuple is sufficient to pinpoint the DIMM, IIUC.
>
> Which means, ghes_edac can have a single layer of DIMMs without channels.

The tricky part is that you have to rely on SMBIOS/DMI to know what DIMMs are
on the system when the driver initializes so you can populate /sys/.*/edac

Later when GHES gives you a NODE/CARD/MODULE) in an error record. You need
to match these up. But SMBIOS only gave you two strings "Locator" and "Bank
Locator" which have no defined syntax. You are at the mercy of the BIOS writer
to put in something parseable. Some writers used zero based counts, others are
Fortran fans and use one-based. Still other use letters. About the one guarantee
is that they will make almost no effort to match the silkscreen labels on the motherboard
itself.

E.g. my Broadwell-EX has things like:

Locator: CHANNEL D DIMM 1
Bank Locator: Memriser8

Channel is A,B,C,D. DIMM is 0, 1, 2. Memriser is {1..8} so this manages to use all
three counting options!

-Tony