RE: [PATCH v2 0/7] AMD64 EDAC fixes

From: Ghannam, Yazen
Date: Thu Aug 15 2019 - 16:08:45 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx> On Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 9:46 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes
>
...
>
> So this still has this confusing reporting of unpopulated nodes:
>
> [ 4.291774] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
> [ 4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
> [ 4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
> [ 4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [ 4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
> [ 4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
> [ 4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [ 4.293347] AMD64 EDAC driver v3.5.0
>
> which needs fixing.
>

Yes, I agree. I was planning to do a fix in a separate set. Is that okay? Or should I add it here?

> Regardless, still not good enough. The snowy owl box I have here has 16
> GB:
>
> $ head -n1 /proc/meminfo
> MemTotal: 15715328 kB
>
> and yet
>
> [ 4.282251] EDAC MC: UMC0 chip selects:
> [ 4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [ 4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [ 4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.283286] EDAC amd64: MC: 0: 8191MB 1: 8191MB
> ^^^^^^^^^^^^^^^^^
>
> [ 4.283456] EDAC amd64: MC: 2: 0MB 3: 0MB
>
> ...
>
> [ 4.285379] EDAC MC: UMC1 chip selects:
> [ 4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [ 4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [ 4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.286244] EDAC amd64: MC: 0: 8191MB 1: 8191MB
> ^^^^^^^^^^^^^^^^^
>
> [ 4.286345] EDAC amd64: MC: 2: 0MB 3: 0MB
>
> which shows 4 chip selects x 8Gb = 32G.
>
> So something's still wrong. Before the patchset it says:
>
> EDAC MC: UMC0 chip selects:
> EDAC amd64: MC: 0: 8192MB 1: 0MB
> ...
> EDAC MC: UMC1 chip selects:
> EDAC amd64: MC: 0: 8192MB 1: 0MB
>
> which is the correct output.
>

Can you please send me the full kernel log and dmidecode output?

Thanks,
Yazen