Re: [PATCH] x86/mce: add support SRAO reported via CMC check

From: Xie XiuQi
Date: Wed Nov 15 2017 - 22:02:25 EST


Hi Borislav, Tony,

On 2017/11/15 18:33, Borislav Petkov wrote:
> On Wed, Nov 15, 2017 at 02:44:07AM +0000, Luck, Tony wrote:
>> This code is subtle :-(
>
> I'm glad that we agree on this! :-)
>
> Anyone wanting to rewrite it yet?
>

In Intel SDM Volume 3B (253669-063US, July 2017), SRAO could be
reported either via MCE or CMC:

In cases when SRAO is signaled via CMCI the error signature is
indicated via UC=1, PCC=0, S=0.

Type(*1) UC EN PCC S AR Signaling
---------------------------------------------------------------
UC 1 1 1 x x MCE
SRAR 1 1 0 1 1 MCE
SRAO 1 x(*2) 0 x(*2) 0 MCE/CMC
UCNA 1 x 0 0 0 CMC
CE 0 x x x x CMC

NOTES:
1. SRAR, SRAO and UCNA errors are supported by the processor only
when IA32_MCG_CAP[24] (MCG_SER_P) is set.
2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

And there is a description in 15.6.2 UCR Error Reporting and Logging, for bit S:

S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
exception was generated for the UCR error reported in this MC bank...
When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
was not signaled via a machine check exception and instead was reported
as a corrected machine check (CMC).

As the description in SDM, I think this flag could be used to determine whether
MCE or CMC was triggered. So we could merge this two case in one and just
remove the S=0 check for SRAO.

How about this patch?