Re: [PATCH 3/3] ghes_edac: add platform check to enable ghes_edac
From: Kani, Toshimitsu
Date: Wed Jul 19 2017 - 12:10:24 EST
On Wed, 2017-07-19 at 07:52 +0200, Borislav Petkov wrote:
> On Tue, Jul 18, 2017 at 09:20:44PM +0000, Kani, Toshimitsu wrote:
> > I agree that 'osc_sb_apei_support_acked' should be checked when
> > enabling ghes_edac.ÂÂI do not know the details of existing issues,
> > but it sounds unlikely that this will address all of them since
> > bugs can be everywhere.
>
> No, see below.
>
> > For instance, ghes_edac relies on DMI/SMBIOS info, unlike
> > other EDAC drivers, which can be buggy regardless of this _OSC
> > info.
>
> That's the problem with firmware. You can't really fix it and it is
> buggy as hell.
Right, and that's what I was told as an issue for ghes_edac. This is
why this patch introduces a white-list to preclude all buggy firmwares
that are unknown to us...
> > I agree that making ghes_edac as a normal module is a good thing,
> > but I do not think it's going to solve this issue.
>
> Of course it will - if the firmware says it wants to look at the
> errors first, then it gets to do so. This is the whole handling of
> hardware errors in the firmware deal. I admit, sometimes it makes
> sense because the firmware has the most intimate knowledge of the
> platform and, in a perfect world, we won't ever need to have
> platform-specific EDAC drivers.
>
> But, we don't live in a perfect world. And the vendor execution of
> the whole firmware-error-handling deal is an abomination at best.
>
> So, if we realize that the firmware is buggy, we can use a platform
> list to blacklist it (^hint hint^) and have a parameter to disable
> ghes_edac from loading.
Setting blacklist needs us to enable ghes_edac and find all buggy
firmwares to date. I think this is too disturbing for people who are
happily using regular edac drivers today even though their platforms
have GHES.
> But we'll deal with that when we get to cross that bridge. Right now,
> I'd like to do the loading spec-conform and not fiddle with white-,
> black-, or any-other-color lists.
I do prefer to avoid any white / black listing. But I do not see how
it solves the buggy DMI/SMBIOS info as an example of firmware bugs we
may have to deal with.
Thanks,
-Toshi