RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

From: Luck, Tony
Date: Tue Nov 11 2014 - 19:22:12 EST


Andy said:
> Yeah. But if you haven't cleared MCIP, you go boom, which is the same
> with pretty much any approach.

The current code has an ugly hole at the moment. End of do_machine_check()
clears MCG_STATUS. At that point we are still running on the magic stack for
machine check exceptions ... if we take a machine check in the small window
from there until we get off this stack (iret) ... then we will enter the handler
back on the same stack that we haven't finished using yet. Bad things ensue.

Andy's RFC patch removes this window. We are already off on the normal stack
when we clear MCG_STATUS.MCIP ... so we enter the new machine check on the
magic stack, but then (I hope) transition to the kernel stack (pushing a new frame
below the other one).

Boris said:
> This is the key: if I enable irqs and the process gets scheduled on
> another CPU, I lose. So I have to be able to say: before you run this
> task on any CPU, kill it.

I don't think it matters if sleep and schedule this task on another cpu. When
we run there we'll keep running the memory_failure() code that we were
in the middle of when we slept. The task can move around - we just need to
make sure it doesn't *return to the user-mode instruction* that hit the machine
check before we find the pte and zero it and mark the page as POISON.

-Tony