Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

From: Borislav Petkov
Date: Sat Jun 08 2019 - 05:10:24 EST


On Sat, Jun 08, 2019 at 10:16:11AM +1000, Benjamin Herrenschmidt wrote:
> Those IP blocks don't need any SW coordination at runtime. The drivers
> don't share data nor communicate with each other. There is absolultely
> no reason to go down that path.

Let me set one thing straight: the EDAC "subsystem" if you will - or
that pile of code which does error counting and reporting - has its
limitations in supporting one EDAC driver per platform. And whenever we
have two drivers loadable on a platform, we have to do dirty hacks like

301375e76432 ("EDAC: Add owner check to the x86 platform drivers")

What that means is, that if you need to call EDAC logging routines or
whatnot from two different drivers, there's no locking, no nothing. So
it might work or it might set your cat on fire.

IOW, having multiple separate "drivers" or representations of RAS
functionality using EDAC facilities is something that hasn't been
done. Well, almost. highbank_mc_edac.c and highbank_l2_edac.c is one
example but they make sure they don't step on each other's toes by using
different EDAC pieces - a device vs a memory controller abstraction.

And now the moment all of a sudden you decide you want for those
separate "drivers" to synchronize on something, you need to do something
hacky like the amd_register_ecc_decoder() thing, for example, because we
need to call into the EDAC memory controller driver to decode a DRAM ECC
error properly, while the rest of the error types get decoded somewhere
else...

Then there comes the issue with code reuse - wouldn't it be great if a
memory controller driver can be shared between platform drivers instead of
copying it in both?

We already do that - see fsl_ddr_edac.c which gets shared between PPC
*and* ARM. drivers/edac/skx_common.c is another example for Intel chips.

Now, if you have a platform with 10 IP blocks which each have RAS
functionality, are you saying you'll do 10 different pieces called

<platform_name>_<ip_block#>_edac.c

?

And if <next_platform> has an old IP block with the old RAS
functionality, you load <platform_name>_<ip_block>_edac.c on the new
platform too?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.