RE: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"

From: Ghannam, Yazen
Date: Thu Aug 23 2018 - 13:53:30 EST


> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Borislav Petkov
> Sent: Thursday, August 23, 2018 7:24 AM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid
> bits are not set"
>
> Reviving an old issue while cleaning my inbox.
>
> On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > > So at a minimum, we should always save and report as much as we can.
> > > >
> > > > Only on Zen or all AMD families?
> > > >
> > >
> > > I'll confirm with the HW folks. I understand it as a change in philosophy
> > > rather than a change in hardware.
> > >
> >
> > So this recommendation could apply to all families, but it's okay if we just
>
> Ok, so I think we should do this, still, as it is exactly what the
> recommendation says: read the MSRs even if the valid bits are not set and it
> doesn't set any Valid bits to confuse error handling downstream.
>
> This way we'll collect all possible info and then mce_amd.c should stop looking
> at the valid bits too and dump whatever has been logged.
>
> Ok?
>

Yes, this seems okay to me.

Thanks,
Yazen