RE: [RFC] x86_64: A real proposal for iret-less return to kernel

From: Luck, Tony
Date: Wed May 21 2014 - 18:33:02 EST


>> The recovery path has to do more than just send a signal - it needs to walk processes and
>> "mm"s to see which have mapped the physical address that the h/w told us has gone bad.
>
> I still feel like I'm missing something. If we interrupted user space
> code, then the context we're in should be identical to the context
> we'll get when we're about to return to userspace.

True. And this far along in do_machine_check() we have set all the other cpus
free, so the are heading back to whatever context we interrupted them in. So
we might be able to do all that other stuff inline here ... we interrupted user
mode, so we know we don't hold any locks. Other cpus are running, so they can
complete what they are doing to release any locks we might need.

But it will take a while (to scan all those processes). And we haven't yet
cleared MCG_STATUS ... so another machine check before we do that
would be fatal (x86 doesn't allow nesting). Even if we moved the work
after the clear of MCG_STATUS we'd still be vulnerable to a new machine
check on x86_64 because we are sitting on the one & only machine check
stack.

-Tony