RE: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND} register contents
From: Ghannam, Yazen
Date: Mon Mar 26 2018 - 16:05:44 EST
> -----Original Message-----
> From: linux-edac-owner@xxxxxxxxxxxxxxx <linux-edac-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Borislav Petkov
> Sent: Monday, March 26, 2018 3:35 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH 2/2] x86/MCE: Always save MCA_{ADDR,MISC,SYND}
> register contents
>
> On Mon, Mar 26, 2018 at 02:15:26PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >
> > The Intel SDM and AMD APM both state that the contents of the
> MCA_ADDR
> > register should be saved if MCA_STATUS[ADDRV] is set. The same applies
> > to MCA_MISC and MCA_SYND (on SMCA systems) and their respective valid
> > bits.
> >
> > However, the Fam17h Processor Programming Reference states
> > "Error handlers should save the values in MCA_ADDR, MCA_MISC0, and
> > MCA_SYND even if MCA_STATUS[AddrV], MCA_STATUS[MiscV], and
> > MCA_STATUS[SyndV] are zero."
>
> Well, then you can't remove valid bit checks for older families. This
> sounds like F17h only.
>
> If so, it better be abstracted away cleanly and not changing the generic
> code.
>
Sure, I can do that. But I didn't think it was necessary because it doesn't hurt
to read the registers whether or not the valid bits are set.
> >
> > This is to ensure that all MCA state information is collected even if
> > software cannot act upon it (because the valid bits are cleared).
> >
> > So always save the auxiliary MCA register contents even if the valid
> > bits are cleared. This should not affect error processing because
> > software should still check the valid bits before using the register
> > contents for error processing.
> >
> > Also, print MCA_{ADDR,MISC,SYND} even if their valid bits are not set.
> > Printing from EDAC/mce_amd is included here since we want to do this on
> > AMD systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> > ---
> > arch/x86/kernel/cpu/mcheck/mce.c | 23 +++++++----------------
> > arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +++-------
> > drivers/edac/mce_amd.c | 10 +++-------
> > 3 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 42cf2880d0ed..a556e1cadfbc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -248,19 +248,14 @@ static void __print_mce(struct mce *m)
> > }
> >
> > pr_emerg(HW_ERR "TSC %llx ", m->tsc);
> > - if (m->addr)
> > - pr_cont("ADDR %llx ", m->addr);
> > - if (m->misc)
> > - pr_cont("MISC %llx ", m->misc);
> > + pr_cont("ADDR %016llx ", m->addr);
> > + pr_cont("MISC %016llx\n", m->misc);
>
> You simply can't do this - this is generic code, not AMD only.
>
I can change this if you'd like. I just thought it would be simpler to
make the change here since we're just printing the values.
Thanks,
Yazen