Re: [PATCH v3 0/5] AMD64 EDAC: Check for nodes without memory, etc.

From: Borislav Petkov
Date: Thu Nov 07 2019 - 05:39:10 EST


On Wed, Nov 06, 2019 at 08:54:17PM +0100, Borislav Petkov wrote:
> which are also two attempts.
>
> Anyway, I'll queue your set and I'll try to debug that thing because it
> is getting on my nerves slowly...

Yah, the problem is that because we have:

MODULE_DEVICE_TABLE(x86cpu, amd64_cpuids);

it gets tried on each CPU because an uevent gets dispatched for each
device, and each CPU is a device.

That's why I see it twice on this box - it has two CPUs.

And Greg says making it attempt once per system can't be done. Unless we
start doing hacks with sending uevents per BSP only which is too much.
Or we can remember the previous return value of the module init function
into edac_core but that's nasty too.

I'm thinking we should simply kill this fat ecc_msg thing which is not
very useful and be done with it:

[ 5.697275] EDAC MC: Ver: 3.0.0
[ 5.909530] EDAC amd64: F10h detected (node 0).
[ 6.345231] EDAC amd64: Node 0: DRAM ECC disabled.
[ 6.370815] EDAC amd64: F10h detected (node 0).
[ 6.370929] EDAC amd64: Node 0: DRAM ECC disabled.

That's probably still a bit annoying on a large machine but better than
nothing.

---
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 3aeb5173e200..0738237e3f09 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3188,18 +3188,6 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
amd64_warn("Error restoring NB MCGCTL settings!\n");
}

-/*
- * EDAC requires that the BIOS have ECC enabled before
- * taking over the processing of ECC errors. A command line
- * option allows to force-enable hardware ECC later in
- * enable_ecc_error_reporting().
- */
-static const char *ecc_msg =
- "ECC disabled in the BIOS or no ECC capability, module will not load.\n"
- " Either enable ECC checking or force module loading by setting "
- "'ecc_enable_override'.\n"
- " (Note that use of the override may cause unknown side effects.)\n";
-
static bool ecc_enabled(struct amd64_pvt *pvt)
{
u16 nid = pvt->mc_node_id;
@@ -3246,11 +3234,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
amd64_info("Node %d: DRAM ECC %s.\n",
nid, (ecc_en ? "enabled" : "disabled"));

- if (!ecc_en || !nb_mce_en) {
- amd64_info("%s", ecc_msg);
+ if (!ecc_en || !nb_mce_en)
return false;
- }
- return true;
+ else
+ return true;
}

static inline void

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette