Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac

From: Mauro Carvalho Chehab
Date: Thu Jul 20 2017 - 16:15:39 EST


Em Thu, 20 Jul 2017 19:50:03 +0000
"Kani, Toshimitsu" <toshi.kani@xxxxxxx> escreveu:

> On Thu, 2017-07-20 at 06:33 +0200, Borislav Petkov wrote:
> > On Wed, Jul 19, 2017 at 04:40:25PM +0000, Kani, Toshimitsu wrote:
> > > Âghes_edac allows to report errors to OS management tools like
> > > rasdaemon in addition to platform- specific managements.
> >
> > So ghes_edac *is* a poor man's driver in the sense that it doesn't do
> > anything fancy but repeat like a parrot data it has gotten from the
> > firmware and shoving it into the EDAC counters. At least that's the
> > intention. Nothing more.
>
> Right for ghes_edac.
>
> > All the action stuff like error detection and recovery should be done
> > by the firmware.
>
> GHES / firmware-first still requires OS recovery actionsÂwhen an error
> cannot be corrected by the platform. They are handled by ghes_proc(),
> and ghes_edac remains its error-reporting wrapper.
>
> > But considering how SNAFU'd firmware is, I wouldn't expect any great
> > RAS functionality there. Of course, I'd be delighted to be proven
> > wrong.
>
> Firmware has better knowledge about the platform and can provide better
> RAS when implemented properly. I agree that user experiences may vary
> on platforms.

It may have a better knowledge, when the vendor ships different BIOS
for platforms with different motherboard silkscreens, but a lot of
vendors just use the same BIOS on different models, with the same
information at "Locator" and "Bank Locator" data at DMI tables,
that don't match what's printed at the board's silkscreen.

So, GHES ends by exposing wrong data. Also, such BIOS fail
to properly expose such knowledge to drivers/userspace.

On the discussions I had with HP, back in 2012, the idea was to try
to have some sort of way for the GHES driver to query the BIOS
on a reliable way, in order to get its layout, in a way
that tools like ras-mc-ctl would properly report the memory
configuration (with --layout) and the motherboard silkscreen
labels (with --print-labels). Unfortunately, at least on that
time, the discussions with HP didn't proceed.

Thanks,
Mauro