Re: [PATCH 3/3] EDAC, ghes: Make it a proper module
From: Mauro Carvalho Chehab
Date: Wed Jul 26 2017 - 14:18:18 EST
Em Wed, 26 Jul 2017 17:27:12 +0000
"Luck, Tony" <tony.luck@xxxxxxxxx> escreveu:
> > > > Hmm... I'm not seeing any implementation that would allow setting
> > > > between firmware first, hardware first or "auto", as we've discussed.
> > >
> > > This is all coming up. As the 0/3 message said, these 3 patches are the
> > > bare minimum of reorganizing stuff only and should serve as a base.
> >
> > I'll then wait for such patch before acking this series.
>
> I didn't think that a BIOS that set "firmware first" gave the OS any choice about this.
>
> What exactly is this option going to do? Fiddle with ACPI OSC??
Currently, my HP server that I use to build the Kernel is FF:
[ 3.783803] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
I didn't try to disable FF on its BIOS. Not sure if it is even possible.
Still, EDAC is working there using sb_edac. As I pointed before, one of the
MC channels is not being detected, but I don't use it on this machine.
Except for that, EDAC seems to be working fine there:
$ ras-mc-ctl --layout
+-----------------------------------------------------------------------+
| mc0 | mc1 |
| channel0 | channel1 | channel2 | channel0 | channel1 | channel2 |
-------+-----------------------------------------------------------------------+
slot2: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
slot1: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB |
slot0: | 16384 MB | 0 MB | 16384 MB | 16384 MB | 0 MB | 16384 MB |
-------+---------------------------------------------------------------------------+
# ras-mc-ctl --guess-labels
memory stick 'PROC 1 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 1 DIMM 12' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 1' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 2' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 3' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 4' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 5' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 6' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 7' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 8' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 9' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 10' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 11' is located at 'Not Specified'
memory stick 'PROC 2 DIMM 12' is located at 'Not Specified'
I didn't try to inject an error, as I'm not sure if EINJ feature is
enabled on this BIOS. Probably not.
At least on this machine, I very much prefer to use sb_edac driver.
As I explained earlier in the previous thread, I just don't if the
BIOS would be doing the right thing for CE, as I don't know its
internal algorithm.
Also, as I'm maintaining the EDAC userspace tools (rasdaemon),
I would really love to get a few CE error reports there from time to
time, as it could be used to check if rasdaemon is doing do the right
thing to them.
So, I very much prefer to not have any threshold at all there at BIOS.
Thanks,
Mauro