Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

From: Andy Lutomirski
Date: Wed Nov 12 2014 - 19:03:03 EST


On Wed, Nov 12, 2014 at 3:41 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>> v2 coming soon with these changes and some additional comment cleanups.
>

v2's not going to make a difference unless you're using uprobes at the
same time.

> So v1 + do_machine_check change is not surviving some real testing. I'm injecting and
> consuming errors sequentially with a small delay in between - so no fancy corner cases with
> multiple errors being processed ... we get all the way done with one error before we start
> the next. Test only survives about 400ish recoveries before Linux dies complaining:
> "Timeout synchronizing machine check over CPUs".
> This probably means that some cpu wandered into the weeds and never showed up in the
> handler.

In the interest of my sanity, can you add something like
BUG_ON(!user_mode_vm(regs)) or the mce_panic equivalent before calling
memory_failure?

What happens if there's a shared bank but the actual offender has a
higher order than the cpu that finds the error?

Is this something I can try under KVM?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/