RE: [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on SMCA systems

From: Ghannam, Yazen
Date: Tue Apr 17 2018 - 14:30:42 EST


> -----Original Message-----
> From: Borislav Petkov <bp@xxxxxxxxx>
> Sent: Tuesday, April 17, 2018 1:21 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
> Cc: linux-edac@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> tony.luck@xxxxxxxxx; x86@xxxxxxxxxx
> Subject: Re: [PATCH] x86/MCE, EDAC/mce_amd: Save all aux registers on
> SMCA systems
>
> On Mon, Apr 02, 2018 at 02:57:07PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@xxxxxxx>
> >
> > The Intel SDM and AMD APM both state that the auxiliary MCA registers
> > should be read if their respective valid bits are set in MCA_STATUS.
> >
> > The Processor Programming Reference for AMD Fam17h systems has a new
> > recommendation that the auxiliary registers should be saved
> > unconditionally. This recommendation can be retroactively applied to
> > older AMD systems. However, we only need to apply this to SMCA systems
> > to avoid modifying behavior on older systems.
>
> Applying the logic of that recommendation on older systems: wouldn't it
> be prudent to save them there too, if it helps debugging an MCE?
>

We could but it's an issue of documentation and testing the older systems.

My first pass at this was to unconditionally read the registers because my
understanding was that registers that aren't accessible would be read-as-zero.
I thought this was a common MCA implementation. But Tony pointed out that
this isn't the case on Intel systems. This is the case on recent AMD systems. But
I don't know if it's the case on older systems which may or may not have
followed the Intel implementation more closely.

So to be safe, HW folks said we can restrict this to only SMCA systems because
1) The recommendation first shows up in the Fam17h PPR.
2) We know it's safe from Fam17h onwards.

> > Define a separate function to save all auxiliary registers on SMCA
> > systems. Call this function from both the MCE handlers and the AMD LVT
> > interrupt handlers so that we don't duplicate code.
> >
> > Print all auxiliary registers in EDAC/mce_amd. Don't restrict this to
> > SMCA systems in order to save a conditional and keep the format similar
> > between SMCA and non-SMCA systems.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@xxxxxxx>
>
> ...
>
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > index f7666eef4a87..b00d5fff1848 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > @@ -244,6 +244,47 @@ static void smca_configure(unsigned int bank,
> unsigned int cpu)
> > }
> > }
> >
> > +
> > +static bool _smca_read_aux(struct mce *m, int bank, bool read_addr)
> > +{
> > + if (!mce_flags.smca)
> > + return false;
> > +
> > + rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
> > + rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
> > +
> > + /*
> > + * We should already have a value if we're coming from the
> Threshold LVT
> > + * interrupt handler. Otherwise, read it now.
> > + */
> > + if (!m->misc)
> > + rdmsrl(msr_ops.misc(bank), m->misc);
> > +
> > + /*
> > + * Read MCA_ADDR if we don't have it already. We should already
> have it
> > + * if we're coming from the interrupt handlers.
> > + */
> > + if (read_addr)
>
> Why not
>
> if (!m->addr)
>
> ?
>
> And yeah, if it has been read to 0 already, reading it again won't
> change anything.
>
> And thinking about it more, you don't really need those if-tests, I'd
> say. So what, you'll read one or two MSRs once more. It is not such a
> hot path that we can't stomach the perf penalty of reading the MSRs.
>

The issue here is because we share this path with the interrupt handlers,
specifically the Deferred error interrupt handler. The DFR handler will
read from MCA_ADDR or MCA_DEADDR so we should just use what it
got. Otherwise, we may read MCA_ADDR and assume it's correct for
the error.

For example,

Deferred error occurs:
- MCA_{STATUS,ADDR,DESTAT,DEADDR} all have valid data.

MCE occurs
- MCA_{STATUS,ADDR} are overwritten with non-zero data.
- MCE handler clears MCA_STATUS. MCA_ADDR is non-zero.

DFR handler finds MCA_STATUS[Deferred] is clear, so it saves
MCA_DESTAT and MCA_DEADDR which is 0.

If !m->addr (which has MCA_DEADDR), then we read MCA_STATUS
which has the address from the MCE.


> > + rdmsrl(msr_ops.addr(bank), m->addr);
> > +
> > + /*
> > + * Extract [55:<lsb>] where lsb is the least significant
> > + * *valid* bit of the address bits.
> > + */
> > + if (m->addr) {
>
> And that test is probably not needed either: if m->addr is 0, the
> below would be 0 anyway.
>
> > + u8 lsb = (m->addr >> 56) & 0x3f;
> > +
> > + m->addr &= GENMASK_ULL(55, lsb);
> > + }
> > +
> > + return true;
> > +}
>
> IOW, those tests are probably ok but getting rid of them would make the
> code more readable and I think we can afford that here.
>

Okay, I'll get rid the last test. But the first one is necessary with our current
code flow.

Thanks,
Yazen