RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"

From: Ghannam, Yazen
Date: Tue Mar 27 2018 - 12:00:06 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Ghannam, Yazen
> Sent: Tuesday, March 27, 2018 10:02 AM
> To: Borislav Petkov <bp@xxxxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> valid bits are not set"
>
> > -----Original Message-----
> > From: Borislav Petkov <bp@xxxxxxxxx>
> > Sent: Monday, March 26, 2018 4:08 PM
> > To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> > Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> > Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if
> > valid bits are not set"
> >
> > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > So at a minimum, we should always save and report as much as we can.
> >
> > Only on Zen or all AMD families?
> >
>
> I'll confirm with the HW folks. I understand it as a change in philosophy
> rather than a change in hardware.
>

So this recommendation could apply to all families, but it's okay if we just
apply this behavior to SMCA systems. That way we don't need to worry
about changing things on legacy systems.

I'll write a new patch that abstracts the register reads and applies the
different behaviors.

In any case, this patch should be reverted since faking the valid bits will
cause the downstream code in the notifier blocks to process errors they
shouldn't.

Thanks,
Yazen