Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs

From: Andy Lutomirski
Date: Fri Mar 06 2015 - 14:24:06 EST


On Fri, Mar 6, 2015 at 9:33 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Mar 5, 2015 at 11:58 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>
>> math_state_restore() was historically called with irqs disabled,
>> because that's how the hardware generates the trap, and also because
>> back in the days it was possible for it to be an asynchronous
>> interrupt and interrupt handlers run with irqs off.
>>
>> These days it's always an instruction trap, and furthermore it does
>> inevitably complex things such as memory allocation and signal
>> processing, which is not done with irqs disabled.
>>
>> So keep irqs enabled.
>
> I agree with the "keep irqs enabled".
>
> However, I do *not* agree with the actual patch, which doesn't do that at all.
>> @@ -844,8 +844,9 @@ void math_state_restore(void)
>> {
>> struct task_struct *tsk = current;
>>
>> + local_irq_enable();
>> +
>
> There's a big difference between "keep interrupts enabled" (ok) and
> "explicitly enable interrupts in random contexts" (*NOT* ok).
>
> So get rid of the "local_irq_enable()" entirely, and replace it with a
>
> WARN_ON_ONCE(irqs_disabled());

I like this a lot better. Ingo's patch scares me because in-kernel
FPU users are fundamentally atomic: if we're using kernel FPU *and*
current has important FPU state, we can't schedule safely because
there's nowhere to save the in-kernel FPU state. I don't see how we
would end up in this situation with CR0.TS set, but if we somehow do,
then we'll corrupt something with your patch.

>
> and let's just fix the cases where this actually gets called with
> interrupts off. In particular, let's just make the
> device_not_available thing use a trap gate, not an interrupt gate. And
> then remove the "conditional_sti()" stuff.

Please don't. IMO it's really nice that we don't use trap gates at
all on x86_64, and I find the conditional_sti thing much nicer than
having to audit all of the entry code to see whether it's safe to run
it with IRQs on.

This isn't to say that using trap gates would be a terrible idea in
general. I think I'd be okay with saying that non-IST
synchronous-only never-from-random-places entries should be trap gates
in general. We could then audit the entry code and convert various
entries.

Thinks that maybe shouldn't be trap gates:

- Anything asynchronous.
- Page faults. The entry code can cause page faults recursively due
to lazy vmap page table fills. Maybe this doesn't matter.
- Anything using IST. That way lies madness.

FWIW, I just started auditing the entry code a bit, and it's currently
unsafe to use trap gates. The IRQ entry code in the "interrupt" macro
does this:

testl $3, CS-RBP(%rsp)
je 1f
SWAPGS

That will go boom quite nicely if it happens very early in device_not_available.

I suspect that fixing that will be slow and unpleasant and will
significantly outweigh any benefit.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/