Re: [PATCH 3/3] x86/mce: Check for alternate indication of machine check recovery on Skylake

From: Dan Williams
Date: Thu Jun 07 2018 - 16:18:37 EST


On Thu, Jun 7, 2018 at 10:43 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> On Fri, May 25, 2018 at 02:42:09PM -0700, Tony Luck wrote:
>> Currently we just check the "CAPID0" register to see whether the CPU
>> can recover from machine checks.
>>
>> But there are also some special SKUs which do not have all advanced
>> RAS features, but do enable machine check recovery for use with NVDIMMs.
>>
>> Add a check for any of bits {8:5} in the "CAPID5" register (each
>> reports some NVDIMM mode available, if any of them are set, then
>> the system supports memory machine check recovery).
>>
>> Cc: stable@xxxxxxxxxxxxxxx # 4.9
>> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>> ---
>
> Has this stalled somewhere? I'd like to see this one go into the
> 4.18 merge because it unbreaks some real hardware.
>
> Parts 1 & 2 are nice-to-have, but they just make for better error
> messages so aren't as critical.

I'm making an effort to get all persistent memory error handling holes
covered this cycle, so I think it makes sense for this to go through
the nvdimm tree. This looks sufficiently non-controversial that I
could justify sending it to Linus along with the other pmem updates.