Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

From: Benjamin Herrenschmidt
Date: Fri Jun 07 2019 - 20:27:11 EST

On Fri, 2019-06-07 at 16:11 +0100, James Morse wrote:
> I'm coming at this from somewhere else. This stuff has to be considered all the way
> through the system. Just because each component supports error detection, doesn't mean you
> aren't going to get silent corruption. Likewise if another platform picks up two piecemeal
> edac drivers for hardware it happens to have in common with yours, it doesn't mean we're
> counting all the errors. This stuff has to be viewed for the whole platform.

Sure but you don't solve that problem by having a magic myplatform.c
overseer. And even if you do, it can perfectly access the individual IP
block drivers, finding them via phandles in the DT for example etc...
without having to make those individual drivers dependent on some over
arching machine wide probing mechanism.

> But this doesn't give you a device you can bind a driver to, to kick this stuff off.
> This (I assume) is why you added a dummy 'edac_l1_l2' node, that just probes the driver.
> The hardware is to do with the CPU and caches, 'edac_l1'_l2' doesn't correspond to any
> distinct part of the soc.
> The request is to use the machine compatible, not a dummy node. This wraps up the firmware
> properties too, and any other platform property we don't know about today.
> Once you have this, you don't really need the cpu/cache integration annotations, and your
> future memory-controller support can be picked up as part of the platform driver.
> If you have otherwise identical platforms with different memory controllers, OF gives you
> the API to match the node in the DT.

Dummy nodes are pefectly fine, and has been from the early days of Open
Firmware. That said, these aren't so much dummy as a way to expose the
control path to the caches. The DT isn't perfect in its structure and
the way caches and CPUs are represented makes it difficult to represent
arbitrary control path to them without extra nodes, which is thus what
people do.