Re: [RFC PATCH] EDAC, ghes: Enable per-layer error reporting for ARM

From: Borislav Petkov
Date: Thu Aug 30 2018 - 06:30:28 EST


On Wed, Aug 29, 2018 at 11:20:48AM +0100, James Morse wrote:
> Right. I'd like ghes-edac to work in the same way for both architectures.
>
> I think this is best done by stuffing the dmi-handle in struct dimm_info during
> ghes_edac_dmidecode(), then populating the struct edac_raw_error_desc layers
> from the matching mci->dimms 'location'.
>
> For EDAC_MC_LAYER_ALL_MEM this boils down to a flat index, so pointer arithmetic
> on mci->dimms is an appropriate short cut.

It all sounds nice on paper but you should try it on a couple of
machines first. See whether/how it actually works there.

Also, this probably would need to not change x86 unless you wanna fix it
there too. I'd think twice before I attempt such a thing though :)

> (We should probably 'FIXME: It shouldn't be hard to also fill the DIMM labels'
> at the same time so that no-one is tempted to interpret the edac:dimm-idx)

See above.

> > In an ideal world, I'd like to be able to query the SPD chips on the
>
> (oh, that can be done?)

There was some talk initially and I've seen BIOS read SPD chips and
showing DIMM info but I've heard the word "proprietary" a couple of
times. Haven't dug any deeper though.

> I got educated by the people who look after specifications last time I touched
> this [0]. SMBIOS tables are required by Arm's 'Server Base Boot Requirements',
> It lists the memory-device and physical-memory-array as required.

It is always better to have certification on your side. Should make ARM
vendors dance.

> I will drop them a note that we will be depending on the handle, and it should
> go on the list too... if its not populated on today's systems we can fall back
> to !e->enable_per_layer_report as we do today.

Right.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--