Re: [PATCH v3 0/5] AMD64 EDAC: Check for nodes without memory, etc.

From: Borislav Petkov
Date: Wed Nov 06 2019 - 14:54:24 EST


On Wed, Nov 06, 2019 at 06:16:12PM +0000, Ghannam, Yazen wrote:
> We had a thread before about usersapce loading the module multiple times on
> failure:
> https://lore.kernel.org/linux-edac/20190822005020.GA403@xxxxxxxxxx/
>
> I tried to look into it a bit, but I didn't get very far.

Right, I'll try to have a look soon, as it reproduces here.

> So is the behavior you see only happening with the new patchset applied? That
> may be a clue that we can fix this in the module.

Actually, it did try twice before your patchset and I didn't notice it
then because it wouldn't spit so much debug output. But that happens now
because your patchset pulls up the detection early. And without it we
had:

$ dmesg | grep -i edac
[ 2.590869] EDAC MC: Ver: 3.0.0
[ 2.594855] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[ 5.939351] EDAC DEBUG: nb_mce_bank_enabled_on_node: core: 0, MCG_CTL: 0x3f, NB MSR is enabled
[ 5.948488] EDAC DEBUG: nb_mce_bank_enabled_on_node: core: 1, MCG_CTL: 0x3f, NB MSR is enabled
[ 5.957312] EDAC amd64: Node 0: DRAM ECC disabled.
[ 5.967746] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[ 6.031424] EDAC DEBUG: nb_mce_bank_enabled_on_node: core: 0, MCG_CTL: 0x3f, NB MSR is enabled
[ 6.042173] EDAC DEBUG: nb_mce_bank_enabled_on_node: core: 1, MCG_CTL: 0x3f, NB MSR is enabled
[ 6.052253] EDAC amd64: Node 0: DRAM ECC disabled.
[ 6.057804] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.

which are also two attempts.

Anyway, I'll queue your set and I'll try to debug that thing because it
is getting on my nerves slowly...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette