Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

From: Borislav Petkov
Date: Wed Nov 12 2014 - 12:31:00 EST


On Wed, Nov 12, 2014 at 05:17:55PM +0000, Luck, Tony wrote:
> > Not that easy for testing the #MC path - there we have to inject real
> > MCEs and then noodle through the memory_failure() code. I'd be very much
> > interested to see what would happen if two MCEs happen back-to-back with
> > your change, the second one being raised when we're on the kernel stack
> > and in memory_failure()...
>
> If the second one hits before we clear MCG_STATUS, then the processor resets.
>
> If the second one is caused by the recovery thread somewhere in memory_failure(),
> then Andy won't switch stacks - but we will declare this a fatal error an panic (we have
> no recovery from machine checks in the kernel).
>
> Otherwise the memory_failure() thread is the innocent bystander. If the affected thread
> decides to do recovery, then the first thread will be allowed to return and continue.
>
> I might worry a bit if the second error is another thread hitting the *same* page which
> hasn't finished processing yet ... then the second will chase along behind the first trying
> to fix the same problem. I *think* the first will complete and the second will just end
> up here:
>
> if (TestSetPageHWPoison(p)) {
> printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn);
> return 0;
> }
>
> which is really early in memory_failure().

Yeah, I meant this case: when we have switched stacks, exited
do_machine_check() and running the recovery code. Exactly then we get
another MCE. And the code might handle it, as you say, but I'd like to
see this in action first to be sure - it is not trivial code.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/