[PATCH 148/208] x86/fpu: Optimize fpu_copy() some more on lazy switching systems

From: Ingo Molnar
Date: Tue May 05 2015 - 13:54:11 EST


The current fpu_copy() code on lazy switching CPUs always saves
into the current fpstate and then copies it over into the child
context:

preempt_disable();
if (!copy_fpregs_to_fpstate(src_fpu))
fpregs_deactivate(src_fpu);
preempt_enable();
memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);

That memcpy() can be avoided on all lazy switching setups except
really old FNSAVE-only systems: change fpu_copy() to directly save
into the child context, for both the lazy and the eager context
switching case.

Note that we still have to do a memcpy() back into the parent
context in the FNSAVE case, but this won't be executed on the
majority of x86 systems that got built in the last 10 years or so.

Reviewed-by: Borislav Petkov <bp@xxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/kernel/fpu/core.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7fdeabc1f2af..8ae4c2450c2b 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -220,16 +220,35 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
{
WARN_ON(src_fpu != &current->thread.fpu);

- if (use_eager_fpu()) {
+ /*
+ * Don't let 'init optimized' areas of the XSAVE area
+ * leak into the child task:
+ */
+ if (use_eager_fpu())
memset(&dst_fpu->state.xsave, 0, xstate_size);
- copy_fpregs_to_fpstate(dst_fpu);
- } else {
- preempt_disable();
- if (!copy_fpregs_to_fpstate(src_fpu))
- fpregs_deactivate(src_fpu);
- preempt_enable();
- memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
+
+ /*
+ * Save current FPU registers directly into the child
+ * FPU context, without any memory-to-memory copying.
+ *
+ * If the FPU context got destroyed in the process (FNSAVE
+ * done on old CPUs) then copy it back into the source
+ * context and mark the current task for lazy restore.
+ *
+ * We have to do all this with preemption disabled,
+ * mostly because of the FNSAVE case, because in that
+ * case we must not allow preemption in the window
+ * between the FNSAVE and us marking the context lazy.
+ *
+ * It shouldn't be an issue as even FNSAVE is plenty
+ * fast in terms of critical section length.
+ */
+ preempt_disable();
+ if (!copy_fpregs_to_fpstate(dst_fpu)) {
+ memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
+ fpregs_deactivate(src_fpu);
}
+ preempt_enable();
}

int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
--
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/