RE: [PATCH] x86/MCE/AMD: Always give PANIC severity for UC errors in kernel context

From: Ghannam, Yazen
Date: Wed Sep 27 2017 - 11:18:04 EST


> -----Original Message-----
> From: Borislav Petkov [mailto:bp@xxxxxxxxx]
> Sent: Tuesday, September 26, 2017 6:21 PM
> To: Ghannam, Yazen <Yazen.Ghannam@xxxxxxx>
...
> > There are the stable branches on kernel.org and some distro kernels
> > based on older kernel versions.
> >
> > The AMD severity grading function was introduced in v4.1 and has this
> issue.
> > However, the following commit was included in v4.6 and masks the issue.
> >
> > b2f9d678e28c x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT
> > exception table entries
> >
> > This patch will apply to v4.9 and later. Another version will be
> > needed to apply to the v4.1 and v4.4. stable branches.
>
> Then write that in the commit message. But *also* add the main reason why
> you're doing this - to explicitly state that IN_KERNEL context is panicked on
> on AMD. Because if it weren't for it, old kernels should simply backport
> b2f9d678e28c and be done with it.
>

Okay , will do.

> And I still don't understand the IN_KERNEL_RECOV thing you mention in the
> commit message. That's Intel-only, what does it have to do with AMD?
>

Generally, we can use the IN_KERNEL_RECOV context to show that the error
is recoverable versus IN_KERNEL which we can consider unrecoverable.

Specifically, the Intel SER and AMD SUCCOR features represent the same
thing (MCA Recovery). I'll send another patch for enabling recovery on
AMD SUCCOR systems. I want to keep this patch as just a bug fix.

> Btw, while at it, fix that signature
>
> static int mce_severity_amd_smca(struct mce *m, int err_ctx)
>
> to
>
> static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
>

Sure, I'll do this in another patch. I want to keep this as a bug fix to apply to
the stable branches.

Thanks,
Yazen