Re: [PATCH v2 4/5] x86/mce: Simplify flow when handling recoverable memory errors

From: Andy Lutomirski
Date: Tue Nov 11 2014 - 12:15:53 EST


On Tue, Nov 11, 2014 at 8:30 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Tue, Nov 11, 2014 at 08:22:45AM -0800, Andy Lutomirski wrote:
>> I think it's okay-ish, but only if it's necessary, and I still don't
>> see why it's necessary.
>>
>> Can't you just remove TIF_MCE_NOTIFY entirely and just do all the
>> mce_notify_process work directly in do_machine_check? IOW, why do you
>> need to store any state per-task when it's already on the stack
>> anyway.
>
> I wish but memory_failure() can't run in #MC context as it noodles
> quite a lot and grabs all kinds of locks and does a bunch of other
> atomit-context-unsafe things.

Oh -- does it need to sleep?

I find myself wondering whether a much cleaner solution might be to
sync regs and switch stacks before invoking do_machine_check rather
than afterwards. Then do_machine_check would really be completely
non-atomic. It would add a few lines of asm, though.

--Andy

>
> And it needs to run *before* the process is killed as it looks at its
> pages.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/