Re: [PATCH 1/2] Revert "x86/mce/AMD: Collect error info even if valid bits are not set"

From: Borislav Petkov
Date: Thu Aug 23 2018 - 08:24:20 EST


Reviving an old issue while cleaning my inbox.

On Tue, Mar 27, 2018 at 03:59:37PM +0000, Ghannam, Yazen wrote:
> > > On Mon, Mar 26, 2018 at 07:58:51PM +0000, Ghannam, Yazen wrote:
> > > > So at a minimum, we should always save and report as much as we can.
> > >
> > > Only on Zen or all AMD families?
> > >
> >
> > I'll confirm with the HW folks. I understand it as a change in philosophy
> > rather than a change in hardware.
> >
>
> So this recommendation could apply to all families, but it's okay if we just

Ok, so I think we should do this, still, as it is exactly what the
recommendation says: read the MSRs even if the valid bits are not set and it
doesn't set any Valid bits to confuse error handling downstream.

This way we'll collect all possible info and then mce_amd.c should stop looking
at the valid bits too and dump whatever has been logged.

Ok?

---
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4b767284b7f5..c14674025615 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -639,10 +639,12 @@ static struct notifier_block mce_default_nb = {
*/
static void mce_read_aux(struct mce *m, int i)
{
- if (m->status & MCI_STATUS_MISCV)
+ bool is_amd = m->cpuvendor == X86_VENDOR_AMD;
+
+ if (m->status & MCI_STATUS_MISCV || is_amd)
m->misc = mce_rdmsrl(msr_ops.misc(i));

- if (m->status & MCI_STATUS_ADDRV) {
+ if (m->status & MCI_STATUS_ADDRV || is_amd) {
m->addr = mce_rdmsrl(msr_ops.addr(i));

/*
@@ -668,7 +670,7 @@ static void mce_read_aux(struct mce *m, int i)
if (mce_flags.smca) {
m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));

- if (m->status & MCI_STATUS_SYNDV)
+ if (m->status & MCI_STATUS_SYNDV || is_amd)
m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
}
}

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--