Re: [PATCH] EDAC: expose per-dimm error counts in sysfs

From: Aaron Miller
Date: Fri Oct 28 2016 - 05:57:06 EST


Whoops, I meant only the 0th slot in each channel.

On 10/27/16, 2:23 PM, "Aaron Miller" <aaronmiller@xxxxxx> wrote:

If your system is like the one Iâm testing on, only the channel 0 DIMM slots are populated, and you injected an error for an unpopulated slot, for which no dimmX directory gets created.

In edac_mc_sysfs.c:

for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm = mci->dimms[i];
/* Only expose populated DIMMs */
if (!dimm->nr_pages)
continue;


I can repro what you saw here:

$ cd /sys/devices/system/edac/mc/mc0
$ grep . dimm*/*location
dimm0/dimm_location:channel 0 slot 0
dimm3/dimm_location:channel 1 slot 0
dimm6/dimm_location:channel 2 slot 0
dimm9/dimm_location:channel 3 slot 0

$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject_channel
$ echo 2 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 3 > /sys/kernel/debug/edac/mc0/fake_inject_count
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject
$ cat ce_count
3

$ grep . dimm*/*ce_count
dimm0/dimm_ce_count:0
dimm3/dimm_ce_count:0
dimm6/dimm_ce_count:0
dimm9/dimm_ce_count:0


And I get what I expect for a populated slot:

$ echo 0 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject
$ cat ce_count
6

$ grep . dimm*/*ce_count
dimm0/dimm_ce_count:0
dimm3/dimm_ce_count:3
dimm6/dimm_ce_count:0
dimm9/dimm_ce_count:0



On 10/27/16, 11:07 AM, "Borislav Petkov" <bp@xxxxxxxxx> wrote:

On Tue, Oct 25, 2016 at 04:25:51PM -0700, Aaron Miller wrote:

<--- This patch needs a commit message.

Especially as to *why* we need this.

> Signed-off-by: Aaron Miller <aaronmiller@xxxxxx>
> ---
> drivers/edac/edac_mc_sysfs.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)

Regardless, something's still not right yet:

$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject_channel
$ echo 2 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 3 > /sys/kernel/debug/edac/mc0/fake_inject_count
^

$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject

$ grep . /sys/devices/system/edac/mc/mc0/*count
/sys/devices/system/edac/mc/mc0/ce_count:3
^

/sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
/sys/devices/system/edac/mc/mc0/ue_count:0
/sys/devices/system/edac/mc/mc0/ue_noinfo_count:0

$ grep -r . /sys/devices/system/edac/mc/mc0/dimm*/* 2>/dev/null | grep ce_count
/sys/devices/system/edac/mc/mc0/dimm0/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm9/dimm_ce_count:0
^

There should be 3 somewhere in the DIMM counters...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.