Re: [PATCH v2 4/5] x86/mce: Simplify flow when handling recoverable memory errors

From: Andy Lutomirski
Date: Tue Nov 11 2014 - 11:23:13 EST


On Tue, Nov 11, 2014 at 8:13 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Tue, Nov 11, 2014 at 07:42:48AM -0800, Andy Lutomirski wrote:
>> The last time I looked at the MCE code, I got a bit lost in the
>> control flow. Is there ever a userspace-killing MCE that's delivered
>> from kernel mode?
>
> Yep, so while you're executing a userspace process, you get
> an #MC raised which reports an error for which action is
> required, i.e. look at all those MCE_AR_SEVERITY errors in
> arch/x86/kernel/cpu/mcheck/mce-severity.c.
>
> It happened within the context of current so we go and run the #MC
> handler which decides that the process needs to be killed in order to
> contain the error. So after we exit the handler and before we return to
> try to sched in the process again on any core, we want to actually kill
> it and poison all its memory.
>
>> By that, I mean that I think that all userspace-killing MCEs go have
>> user_mode_vm(regs) and go through paranoid_exit.
>
> Yes.
>
>> If so, why do you need to jump through hoops at all? You can't call
>> do_exit, but it should be completely safe to force a fatal signal and
>> let the scheduler and signal code take care of killing the process,
>> right? For that matter, you should also be able to poke at vm
>> structures, etc.
>
> Well, we do that already. memory-failure.c does kill the processes when
> it decides to.
>
> The only question is whether adding two new members to task_struct is
> ok. It is nicely convenient and it all falls into place.
>
> In the #MC handler we do:
>
> if (worst == MCE_AR_SEVERITY) {
> /* schedule action before return to userland */
> + current->paddr = m.addr;
> + current->restartable = !!(m.mcgstatus & MCG_STATUS_RIPV);
> set_thread_flag(TIF_MCE_NOTIFY);
> }
>
> and then before we return to userspace we do:
>
> + if (!current->restartable)
> flags |= MF_MUST_KILL;
> if (memory_failure(pfn, MCE_VECTOR, flags) < 0) {
>
> and the MF_MUST_KILL makes sure memory_failure() does a force_sig().
>
> So I think this is ok, I only think that people might oppose the two new
> members to task_struct but it looks clean to me this way. IMHO at least.
>

I think it's okay-ish, but only if it's necessary, and I still don't
see why it's necessary.

Can't you just remove TIF_MCE_NOTIFY entirely and just do all the
mce_notify_process work directly in do_machine_check? IOW, why do you
need to store any state per-task when it's already on the stack
anyway.

Or am I missing something here?

--Andy

>> Or is there a meaningful case where mce_notify_process needs to help
>> with recovery but the original MCE happened with !user_mode_vm(regs)?
>
> Well, for the !user_mode_vm(regs) case we panic anyway.
>
> Thanks Andy.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/