Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130()
From: Borislav Petkov
Date: Wed Feb 17 2016 - 04:29:28 EST
On Wed, Feb 17, 2016 at 09:16:46AM +0100, Ingo Molnar wrote:
> So I'm wondering why this started triggering only now. Is this a pre-existing bug
> that somehow got triggered via:
>
> 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs
>
> ?
Well, that's an interesting question. See, the thing is, I triggered
this only *once* by accident and I haven't seen it ever since.
The "reliable" "reproducer" I used to debug this was Andy's suggestion
to stick a schedule() in __fpu__restore_sig().
So the answer to that question is not easy.
BUT(!), regardless, the bug still needs to be fixed because my tracing
here
https://lkml.kernel.org/r/20160215191422.GB32716@xxxxxxx
showed that getting preempted after setting
fpu->fpstate_active = 1;
leads to the WARN. Because - and please doublecheck me on that - when
we're in __switch_to() and the task which already has ->fpstate_active
set and it is the next task to which we're going to switch to, when it
enters switch_fpu_prepare(), it does:
fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
new_fpu->fpstate_active &&
^^^^^^^^^^^^^^^^^^^^^^^
so that fpu.preload is set now.
A bit later in that same function:
/* Don't change CR0.TS if we just switch! */
if (fpu.preload) {
new_fpu->counter++;
__fpregs_activate(new_fpu);
^^^^^^^^^^^^^^^^^
->fpregs_active gets set here and when the task returns to
__fpu__restore_sig(), fpu__restore() sets it again, leading to the WARN.
Mind you, this happens on 32-bit only because there we sigreturn with
irqs enabled. Look at the call trace.
> If yes then we need a plausible theory of how that never triggered on
> modern Intel CPUs that had eagerfpu enabled for years.
AFAICT, it triggers - and the window is very small at that - only on
32-bit. If at all.
> Or perhaps was it caused by one of the other changes in tip:x86/fpu:
>
> c6ab109f7e0e x86/fpu: Speed up lazy FPU restores slightly
> a20d7297045f x86/fpu: Fold fpu_copy() into fpu__copy()
> 5ed73f40735c x86/fpu: Fix FNSAVE usage in eagerfpu mode
> 4ecd16ec7059 x86/fpu: Fix math emulation in eager fpu mode
>
> ?
I can certainly try to test all those but I don't have a reliable
reproducer. The only thing I could do is check out each of those commits
and stick a schedule() in __fpu__restore_sig() and see what happens.
But if my analysis above is right, none of those would matter because of
the mechanism of how the warn happens...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.