Re: [PATCH v3 02/22] x86,mce: Delete ist_begin_non_atomic()

From: Andy Lutomirski
Date: Wed Feb 19 2020 - 17:48:40 EST



> On Feb 19, 2020, at 2:33 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> ï
>>
>> One big question here: are memory failure #MC exceptions synchronous
>> or can they be delayed? If we get a memory failure, is it possible
>> that the #MC hits some random context and not the actual context where
>> the error occurred?
>
> There are a few cases:
> 1) SRAO (Software recoverable action optional) [Patrol scrub or L3 cache eviction]
> These aren't synchronous with any core execution. Using machine check to signal
> was probably a mistake - compounded by it being broadcast :-( Could pick any CPU
> to handle (actually choose the first to arrive in do_machine_check()). That guy should
> arrange to soft offline the affected page. Every CPU can return to what they were doing
> before.

You could handle this by sending IPI-to-self and dealing with it in the interrupt handler. Or even wake a high-priority kthread or workqueue. irq_work may help. Relying on task_work or the non_atomic stuff seems silly - you canât rely on anything about the interrupted context, and the context is more or less irrelevant anyway.

>
> 2) SRAR (Software recoverable action required)
> These are synchronous. Starting with Skylake they may be signaled just to the thread
> that hit the poison. Earlier generations broadcast.

Hereâs where dealing with one that came from kernel code is just nasty, right?

I would argue that, if IF=0, killing the machine is reasonable. If IF=1, we should be okay. Actually making this work sanely is gross, and arguably the goal should be minimizing grossness.

Perhaps, if we came from kernel mode, we should IPI-to-self and use a special vector that is idtentry, not apicinterrupt. Or maybe even do this for entries from usermode just to keep everything consistent.

> 2a) Hit in ring3 code ... we want to offline the page and SIGBUS the task(s)
> 2b) Memcpy_mcsafe() ... kernel has a recovery path. "Return" to the recovery code instead of to the original RIP.
> 2c) copy_from_user ... not implemented yet. We are in kernel, but would like to treat this like case 2a
>
> 3) Fatal
> Always broadcast. Some bank has MCi_STATUS.PCC==1. System must be shutdown.

Easy :)

It would be really, really nice if NMI was masked in MCE context.

>
> -Tony