Re: [PATCH 1/3] x86, ras: Add new infrastructure for machine check fixup tables

From: Luck, Tony
Date: Thu Nov 12 2015 - 14:44:44 EST


On Wed, Nov 11, 2015 at 08:14:56PM -0800, Andy Lutomirski wrote:
> On 11/06/2015 12:57 PM, Tony Luck wrote:
> >Copy the existing page fault fixup mechanisms to create a new table
> >to be used when fixing machine checks. Note:
> >1) At this time we only provide a macro to annotate assembly code
> >2) We assume all fixups will in code builtin to the kernel.
>
> Shouldn't the first step be to fixup failures during user memory access?

We already have code to recover from machine checks encountered
while the processor is executing ring3 code.

This series is gently extending to ring0 code in some places that look
to be high enough profile to warrant the attention (and that we have
some plan for a recovery action). Initial user will be filessytem code
using NVDIMM as storage. I.e. lots of memory accessed by a small amount
of code. If we get a machine check reading the NVDIMM, then we turn it
into -EIO.

> This does something really weird to rax. (Also, what happens on 32-bit
> kernels? There's no bit 63.)

32-bit kernels are out of luck for this - but I don't feel bad about it -
you simply cannot run a 32-bit kernel on machines that have this level
of recovery (they have too much memory to boot 32-bit kernels).

> Please at least document it clearly.

Will do.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/