Re: [PATCH] EDAC: expose per-dimm error counts in sysfs
From: Aaron Miller
Date: Fri Oct 28 2016 - 05:57:06 EST
Whoops, I meant only the 0th slot in each channel.
On 10/27/16, 2:23 PM, "Aaron Miller" <aaronmiller@xxxxxx> wrote:
If your system is like the one Iâm testing on, only the channel 0 DIMM slots are populated, and you injected an error for an unpopulated slot, for which no dimmX directory gets created.
In edac_mc_sysfs.c:
for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm = mci->dimms[i];
/* Only expose populated DIMMs */
if (!dimm->nr_pages)
continue;
I can repro what you saw here:
$ cd /sys/devices/system/edac/mc/mc0
$ grep . dimm*/*location
dimm0/dimm_location:channel 0 slot 0
dimm3/dimm_location:channel 1 slot 0
dimm6/dimm_location:channel 2 slot 0
dimm9/dimm_location:channel 3 slot 0
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject_channel
$ echo 2 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 3 > /sys/kernel/debug/edac/mc0/fake_inject_count
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject
$ cat ce_count
3
$ grep . dimm*/*ce_count
dimm0/dimm_ce_count:0
dimm3/dimm_ce_count:0
dimm6/dimm_ce_count:0
dimm9/dimm_ce_count:0
And I get what I expect for a populated slot:
$ echo 0 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject
$ cat ce_count
6
$ grep . dimm*/*ce_count
dimm0/dimm_ce_count:0
dimm3/dimm_ce_count:3
dimm6/dimm_ce_count:0
dimm9/dimm_ce_count:0
On 10/27/16, 11:07 AM, "Borislav Petkov" <bp@xxxxxxxxx> wrote:
On Tue, Oct 25, 2016 at 04:25:51PM -0700, Aaron Miller wrote:
<--- This patch needs a commit message.
Especially as to *why* we need this.
> Signed-off-by: Aaron Miller <aaronmiller@xxxxxx>
> ---
> drivers/edac/edac_mc_sysfs.c | 38 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 38 insertions(+)
Regardless, something's still not right yet:
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject_channel
$ echo 2 > /sys/kernel/debug/edac/mc0/fake_inject_slot
$ echo 3 > /sys/kernel/debug/edac/mc0/fake_inject_count
^
$ echo 1 > /sys/kernel/debug/edac/mc0/fake_inject
$ grep . /sys/devices/system/edac/mc/mc0/*count
/sys/devices/system/edac/mc/mc0/ce_count:3
^
/sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
/sys/devices/system/edac/mc/mc0/ue_count:0
/sys/devices/system/edac/mc/mc0/ue_noinfo_count:0
$ grep -r . /sys/devices/system/edac/mc/mc0/dimm*/* 2>/dev/null | grep ce_count
/sys/devices/system/edac/mc/mc0/dimm0/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm6/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/dimm9/dimm_ce_count:0
^
There should be 3 somewhere in the DIMM counters...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.