Re: [PATCH] Make math_state_restore() save and restore the interrupt flag

From: Linus Torvalds
Date: Thu Jan 30 2014 - 17:24:34 EST


On Thu, Jan 30, 2014 at 2:01 PM, Nate Eldredge
<nate@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> If math_state_restore() is called in a task that has not used math, it
> needs to allocate some memory (via init_fpu()). Since this can sleep,
> it enables interrupts first. Currently, it always disables them
> afterwards, regardless of whether or not they were enabled on entry.
> (See commit aa283f4927 where this was introduced.) This doesn't make
> sense, so instead have it put interrupts back the way they were.

Hmm. What doesn't make sense is to have interrupts disabled in the
caller - since clearly math_state_restore() may actually let
interrupts in.

So while I definitely think your patch fixes a bug, I'm wondering if
we shouldn't look deeper at this issue. The comment above says

* Must be called with kernel preemption disabled (eg with local
* local interrupts as in the case of do_device_not_available).

which *kind* of makes sense, but at the same time only reinforces that
whole "if there are reasons why the irq has to be disabled, we can't
just randomly enable it for allocations".

So I really get the feeling that there are more bugs than just the "it
exits with irqs disabled" issue.

> The aesni-intel code (encrypting the core
> file that we are writing) needs the FPU and quite properly wraps its
> code in kernel_fpu_{begin,end}(), the latter of which calls
> math_state_restore().

Ok, and this is interesting too.

kernel_fpu_{begin|end}() shouldn't mess around with tsk_used_math(), I
feel. If the task hasn't used math, no reason to allocate any
save-space, because kernel_fpu_end() will just throw it all away
anyway.

Plus we want to be able to do those things from interrupts, and:
- we can't have blocking allocations
- we should try to not mess with these kinds of process states from
interrupts anyway

So I think that math_state_restore() and the use_eager_fpu() changes
were actually seriously buggy here wrt that whole !tsk_used_math()
case.

I'm adding in some people here, because I think in the end this bug
was introduced by commit 304bceda6a18 ("x86, fpu: use non-lazy fpu
restore for processors supporting xsave") that introduced that
math_state_restore() in kernel_fpu_end(), but we have other commits
(like 5187b28ff08: "x86: Allow FPU to be used at interrupt time even
with eagerfpu") that seem tangential too and might be part of why it
actually *triggers* now.

Comments?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/