Re: [PATCH 12/21] EDAC, ghes: Add support for legacy API counters

From: James Morse
Date: Wed Jun 19 2019 - 13:27:49 EST

Hi Robert,

On 12/06/2019 19:41, Robert Richter wrote:
> On 29.05.19 16:13:02, James Morse wrote:
>> On 29/05/2019 09:44, Robert Richter wrote:
>>> The ghes driver is not able yet to count legacy API counters in sysfs,
>>> e.g.:
>>> /sys/devices/system/edac/mc/mc0/csrow2/ce_count
>>> /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count
>>> /sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count
>>> Make counting csrows/channels generic so that the ghes driver can use
>>> it too.
>> What for?
> With EDAC_LEGACY_SYSFS enabled those counters are exposed to sysfs,
> but the numbers are wrong (all zero).

Excellent, so its legacy and broken.

>> Is this for an arm64 system? Surely we don't have any systems that used to work with these
>> legacy counters. Aren't they legacy because we want new software to stop using them!
> The option is to support legacy userland. If we want to provide a> similar "user experience" as for x86 the counters should be correct.

The flip-side is arm64 doesn't have the same baggage. These counters have never worked
with this driver (even on x86).

This ghes driver also probes on HPE Server platform, so the architecture isn't really
relevant. (I was curious why Marvell care).

> Of course it is not a real mapping to csrows, but it makes that i/f
> work.

(...which stinks)

> In any case, this patch cleans up code as old API's counter code is
> isolated and moved to common code. Making the counter's work for ghes
> is actually a side-effect here. The cleanup is a prerequisit for
> follow on patches.

I'm all for removing/warning-its-broken it when ghes_edac is in use. But the convincing
argument is debian ships a 'current' version of edac-utils that predates 199747106934,
(that made all this fake csrow stuff deprecated), and debian's popcon says ~1000 people
have it installed.

If you want it fixed, please don't do it as a side effect of cleanup. Fixes need to be a
small separate series that can be backported. (unless we're confident no-one uses it, in
which case, why fix it?)