[tip: x86/cpu] EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId

From: tip-bot2 for Yazen Ghannam
Date: Thu Nov 19 2020 - 06:29:41 EST


The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 8de0c9917cc1297bc5543b61992d5bdee4ce621a
Gitweb: https://git.kernel.org/tip/8de0c9917cc1297bc5543b61992d5bdee4ce621a
Author: Yazen Ghannam <yazen.ghannam@xxxxxxx>
AuthorDate: Mon, 09 Nov 2020 21:06:58
Committer: Borislav Petkov <bp@xxxxxxx>
CommitterDate: Thu, 19 Nov 2020 11:43:21 +01:00

EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeId

The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
later systems. This function is used in amd64_edac_mod to do
system-specific decoding for DRAM ECC errors. The function takes a
"NodeId" as a parameter.

In AMD documentation, NodeId is used to identify a physical die in a
system. This can be used to identify a node in the AMD_NB code and also
it is used with umc_normaddr_to_sysaddr().

However, the input used for decode_dram_ecc() is currently the NUMA node
of a logical CPU. In the default configuration, the NUMA node and
physical die will be equivalent, so this doesn't have an impact.

But the NUMA node configuration can be adjusted with optional memory
interleaving modes. This will cause the NUMA node enumeration to not
match the physical die enumeration. The mismatch will cause the address
translation function to fail or report incorrect results.

Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the
physical ID is used.

Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@xxxxxxx
---
drivers/edac/mce_amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 85095e3..5dd905a 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1003,7 +1003,7 @@ static void decode_smca_error(struct mce *m)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(cpu_to_node(m->extcpu), m);
+ decode_dram_ecc(topology_die_id(m->extcpu), m);
}

static inline void amd_decode_err_code(u16 ec)