Re: [PATCH 1/1] EDAC, sb_edac: Fix channel reporting on Knights Landing
From: Borislav Petkov
Date: Wed Aug 03 2016 - 05:06:29 EST
On Wed, Aug 03, 2016 at 09:58:38AM +0200, Borislav Petkov wrote:
> "Simple" is my speciality! :-)))
...and so I did simplify it a bit more. Here's the final result:
---
From: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
Date: Sat, 23 Jul 2016 01:44:49 +0200
Subject: [PATCH] EDAC, sb_edac: Fix channel reporting on Knights Landing
On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.
We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.
Without this patch:
errors in DIMM_A and DIMM_D were reported in DIMM_D
errors in DIMM_B and DIMM_E were reported in DIMM_E
errors in DIMM_C and DIMM_F were reported in DIMM_F
Correct this.
Hubert Chrzaniuk:
- rebased to 4.8
- comments and code cleanup
Fixes: d0cdf9003140 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
Cc: Mauro Carvalho Chehab <mchehab@xxxxxxxxxx>
Cc: Hubert Chrzaniuk <hubert.chrzaniuk@xxxxxxxxx>
Cc: linux-edac <linux-edac@xxxxxxxxxxxxxxx>
Cc: lukasz.anaczkowski@xxxxxxxxx
Cc: lukasz.odzioba@xxxxxxxxx
Cc: mchehab@xxxxxxxxxx
Cc: <stable@xxxxxxxxxxxxxxx> # v4.5..
Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@xxxxxxxxx
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@xxxxxxxxx>
[ Boris: Simplify a bit by removing char mc. ]
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
---
drivers/edac/sb_edac.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 4fb2eb7c800d..ce0067b7a2f6 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -552,9 +552,9 @@ static const struct pci_id_table pci_dev_descr_haswell_table[] = {
/* Knight's Landing Support */
/*
* KNL's memory channels are swizzled between memory controllers.
- * MC0 is mapped to CH3,5,6 and MC1 is mapped to CH0,1,2
+ * MC0 is mapped to CH3,4,5 and MC1 is mapped to CH0,1,2
*/
-#define knl_channel_remap(channel) ((channel + 3) % 6)
+#define knl_channel_remap(mc, chan) ((mc) ? (chan) : (chan) + 3)
/* Memory controller, TAD tables, error injection - 2-8-0, 2-9-0 (2 of these) */
#define PCI_DEVICE_ID_INTEL_KNL_IMC_MC 0x7840
@@ -1286,7 +1286,7 @@ static u32 knl_get_mc_route(int entry, u32 reg)
mc = GET_BITFIELD(reg, entry*3, (entry*3)+2);
chan = GET_BITFIELD(reg, (entry*2) + 18, (entry*2) + 18 + 1);
- return knl_channel_remap(mc*3 + chan);
+ return knl_channel_remap(mc, chan);
}
/*
@@ -2997,8 +2997,15 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
} else {
char A = *("A");
- channel = knl_channel_remap(channel);
+ /*
+ * Reported channel is in range 0-2, so we can't map it
+ * back to mc. To figure out mc we check machine check
+ * bank register that reported this error.
+ * bank15 means mc0 and bank16 means mc1.
+ */
+ channel = knl_channel_remap(m->bank == 16, channel);
channel_mask = 1 << channel;
+
snprintf(msg, sizeof(msg),
"%s%s err_code:%04x:%04x channel:%d (DIMM_%c)",
overflow ? " OVERFLOW" : "",
--
2.8.4
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--