Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

From: Andy Lutomirski
Date: Tue Nov 11 2014 - 18:21:26 EST


On Tue, Nov 11, 2014 at 3:09 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote:
>> I wonder what the IRET is for. There had better not be another magic
>> IRET unmask thing. I'm guessing that the actual semantics are that
>> nothing whatsoever can mask #MC, but that a second #MC when MCIP is
>> still set is a shutdown condition.
>
> Hmmm, both manuals are unclear as to what exactly reenables #MC. So
> forget about IRET and look at this: "When the processor receives a
> machine check when MCIP is set, it automatically enters the shutdown
> state." so this really reads like a second #MC while the first is
> happening would shutdown the system - regardless whether I'm still in
> #MC context or not, running the first #MC handler.
>
> I guess I needz me some hw people to actually confirm.
>
>> Define "atomic".
>>
>> You're still running with irqs off and MCIP set. At some point,
>
> Yes, I need to be atomic wrt to another #MC so that I can be able to
> read out the MCA MSRs in time and undisturbed.
>
>> you're presumably done with all of the machine check registers, and
>> you can clear MCIP. Now, if current == victim, you can enable irqs
>> and do whatever you want.
>
> This is the key: if I enable irqs and the process gets scheduled on
> another CPU, I lose. So I have to be able to say: before you run this
> task on any CPU, kill it.

Why do you lose? With my patch applied, you are that process, and the
process can't possibly return to user space until you return from
do_machine_check. In other words, it works kind of like a page fault.

>
>> In my mind, the benefit is that you don't need to think about how to
>> save your information and arrange to get called back the next time
>> that the victim task is a non-atomic context, since you *are* the
>> victim task and you're running in normal irqs-disabled kernel mode.
>>
>> In contrast, with the current entry code, if you enable IRQs or so
>> anything that could sleep, you're on the wrong stack, so you'll crash.
>> That means that taking mutexes, even after clearing MCIP, is
>> impossible.
>
> Hmm, it is late here and I need to think about this on a clear head
> again but I think I can see the benefit of this to a certain extent.
> However(!), I need to be able to run undisturbed and do the minimum work
> in the #MC handler before I reenable MCEs.
>
> But Tony also has a valid point as in what is going to happen if I
> get another MCE while doing the memory_failure() dance. I guess if
> memory_failure() takes proper locks, the second #MC will get to wait
> until the first is done. But who knows in reality ...

Yeah. But if you haven't cleared MCIP, you go boom, which is the same
with pretty much any approach.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/