RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace
From: Luck, Tony
Date: Wed Nov 12 2014 - 12:18:36 EST
> Not that easy for testing the #MC path - there we have to inject real
> MCEs and then noodle through the memory_failure() code. I'd be very much
> interested to see what would happen if two MCEs happen back-to-back with
> your change, the second one being raised when we're on the kernel stack
> and in memory_failure()...
If the second one hits before we clear MCG_STATUS, then the processor resets.
If the second one is caused by the recovery thread somewhere in memory_failure(),
then Andy won't switch stacks - but we will declare this a fatal error an panic (we have
no recovery from machine checks in the kernel).
Otherwise the memory_failure() thread is the innocent bystander. If the affected thread
decides to do recovery, then the first thread will be allowed to return and continue.
I might worry a bit if the second error is another thread hitting the *same* page which
hasn't finished processing yet ... then the second will chase along behind the first trying
to fix the same problem. I *think* the first will complete and the second will just end
up here:
if (TestSetPageHWPoison(p)) {
printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn);
return 0;
}
which is really early in memory_failure().
-Tony
N§²æ¸yú²X¬¶ÇvØ)Þ{.nÇ·¥{±êX§¶¡Ü}©²ÆzÚj:+v¨¾«êZ+Êzf£¢·h§~Ûÿû®w¥¢¸?¨è&¢)ßfùy§m
á«a¶Úÿ0¶ìå