[PATCH 2/2] x86, fpu: fix boot time GPF with eagerfpu=off

From: Dave Hansen
Date: Tue Apr 28 2015 - 13:00:05 EST

From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>

I'm seeing some widespread failures to boot if I boot with
eagerfpu=off. This only happens on hardware that supports
'xsaves', which is not widely available, so it is not
seen widely in practice.

The failure comes from init hitting a GPF very early in its
execution. The GPF comes from a 'xrstors' operation that
the kernel performs which kills init.

The problem is essentially that "fpu_finit()" does not have
any knowledge about how to initialize an FPU buffer
(tsk->fpu->state) which will be restored with xsaves. This
does not hit the use_eager_fpu() xsaves code because it
always does a __save_fpu() to initialize the FPU state from
the the process doing the fork()'s *registers* instead of
the 'init_xstate_buf' via init_fpu().

The code calls fx_finit(&fpu->state->fxsave) on the FPU state
to initialize the 'fxsave' style FPU buffer and then soon
after does an 'xrstors'-style restore. That promptly GPF's.

As far as I can tell init_fpu() is only called when a task
has !used_math(). That never happens in the eagerfpu=on

We never noticed this problem because this commit:

e7d820a5: x86, xsave: Support eager-only xsave features, add MPX support

effectively disabled lazy FPU mode on all hardware that
supports xsaves. I think that commit itself is wrong
and there is no fundamental reason we can not support
lazy FPU mode with the MPX registers being xsave'd. But,
that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>

b/arch/x86/kernel/i387.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff -puN arch/x86/kernel/i387.c~x86-fpu-fix-boot-time-gpf-with-eagerfpuoff arch/x86/kernel/i387.c
--- a/arch/x86/kernel/i387.c~x86-fpu-fix-boot-time-gpf-with-eagerfpuoff 2015-04-28 09:51:55.634326943 -0700
+++ b/arch/x86/kernel/i387.c 2015-04-28 09:53:23.065270248 -0700
@@ -19,6 +19,7 @@
#include <asm/i387.h>
#include <asm/fpu-internal.h>
#include <asm/user.h>
+#include <asm/xcr.h>

static DEFINE_PER_CPU(bool, in_kernel_fpu);

@@ -218,6 +219,41 @@ void fpu_init(void)

+ * This will take the 'xsave' portion of a 'thread_xstate'
+ * and get it in to a state where an 'xrstor' or 'xrstors'
+ * of the buffer will result in all of the xsave state
+ * components being set to their initialized state.
+ *
+ * We initialize the minimal set of fields in the header
+ * and rely on the CPU to initialize the registers
+ * themselves.
+ */
+static void xsave_fpu_buf_init(struct xsave_hdr_struct *xhdr)
+ /*
+ * This will cause the xrstor[s] to initialize
+ * all the xsave state components.
+ */
+ xhdr->xstate_bv = 0;
+ /*
+ * xrstors will GPF if this bit is not set
+ */
+ if (boot_cpu_has(X86_FEATURE_XSAVEC))
+ else
+ xhdr->xcomp_bv = 0;
+ /*
+ * SDM Says we will GPF:
+ * "If bytes 63:16 of the XSAVE header are not all zero."
+ * If this reserved size ever changes, we will need to
+ * update this code to look at and conditionally
+ * initialize the new fields, just like ->xcomp_bv above.
+ */
+ BUILD_BUG_ON(sizeof(xhdr->reserved) != 48);
+ memset(xhdr->reserved, 0, sizeof(xhdr->reserved));
void fpu_finit(struct fpu *fpu)
if (!cpu_has_fpu) {
@@ -225,6 +261,15 @@ void fpu_finit(struct fpu *fpu)

+ /*
+ * The xsave code does not require the entire buffer
+ * to be zeroed, only a chunk of the header.
+ */
+ if (cpu_has_xsave) {
+ xsave_fpu_buf_init(&fpu->state->xsave.xsave_hdr);
+ return;
+ }
memset(fpu->state, 0, xstate_size);

if (cpu_has_fxsr) {
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/