Re: linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot)

From: John Garry
Date: Tue Nov 26 2019 - 04:59:11 EST


On 22/11/2019 11:28, Robert Richter wrote:
On 21.11.19 15:23:42, John Garry wrote:
On 21/11/2019 14:23, Robert Richter wrote:
On 21.11.19 12:34:22, John Garry wrote:

[ 22.046666] EDAC MC: bug in low-level driver: attempt to assign
[ 22.046666] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.058311] ghes_edac: Can't register at EDAC core
[ 22.065402] EDAC MC: bug in low-level driver: attempt to assign
[ 22.065402] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.077080] ghes_edac: Can't register at EDAC core
[ 22.084140] EDAC MC: bug in low-level driver: attempt to assign
[ 22.084140] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.095789] ghes_edac: Can't register at EDAC core
[ 22.102873] EDAC MC: bug in low-level driver: attempt to assign
[ 22.102873] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.115442] ghes_edac: Can't register at EDAC core
[ 22.122536] EDAC MC: bug in low-level driver: attempt to assign
[ 22.122536] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.134344] ghes_edac: Can't register at EDAC core
[ 22.141441] EDAC MC: bug in low-level driver: attempt to assign
[ 22.141441] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.153089] ghes_edac: Can't register at EDAC core
[ 22.160161] EDAC MC: bug in low-level driver: attempt to assign
[ 22.160161] duplicate mc_idx 0 in add_mc_to_global_list()
[ 22.171810] ghes_edac: Can't register at EDAC core

What I am more concerned is this here. In total this implies 8 ghes
users that all try to register a (single-instance) ghes mc device. For
non-x86 only one instance is allowed (see ghes_edac_register(), idx =
0).

I also looked into this: With refcount_inc_checked() enabled, the
refcount is *not* increased from 0 to 1.

Yeah, I had quickly checked this back then and I think you're right.

Thanks,
John

Under the hood only
refcount_inc_not_zero() is called instead of refcount_inc(). So the
refcount is still zero after an edac mc device was registered. Instead
of sharing the edac mc device, the driver tries to allocate another mc
device for each GHESv2 entry in the HEST table. This causes the
'duplicate mc_idx' message. Also, it is ok to have multiple GHESv2
entries (your system seems to have 8 entries), e.g. to serve different
kind of errors in the system.

Thanks,

-Robert
.