Re: [PATCH 1/8] x86, fpu: unlazy_fpu: don't reset thread.fpu_counter

From: Borislav Petkov
Date: Mon Feb 16 2015 - 12:05:27 EST


On Fri, Feb 06, 2015 at 03:01:58PM -0500, riel@xxxxxxxxxx wrote:
> From: Oleg Nesterov <oleg@xxxxxxxxxx>
>
> It is not clear why the "else" branch clears ->fpu_counter, this makes
> no sense.

See below for why.

> If use_eager_fpu() then this has no effect. Otherwise, if we actually
> wanted to prevent fpu preload after the context switch we would need to
> reset it unconditionally, even if __thread_has_fpu().

and AFAICT, switch_fpu_prepare() already does that so I'd go ahead and
venture a guess that this has survived all the cleanups (I had to go
surf down memory lane to find Arjan's patch) over the years and is still
there for no apparent reason.

> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxx>
> ---
> arch/x86/kernel/i387.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
> index 47348653503a..c3b92c0975cd 100644
> --- a/arch/x86/kernel/i387.c
> +++ b/arch/x86/kernel/i387.c
> @@ -122,8 +122,7 @@ void unlazy_fpu(struct task_struct *tsk)
> if (__thread_has_fpu(tsk)) {
> __save_init_fpu(tsk);
> __thread_fpu_end(tsk);
> - } else
> - tsk->thread.fpu_counter = 0;
> + }

... and so by looking at the unlazy_fpu() call sites, I think this makes
sense.

So how's that for a commit message instead:

---
x86, fpu, unlazy_fpu: Don't reset thread.fpu_counter

The "else" branch clears ->fpu_counter as a remnant of the lazy FPU
usage counting:

e07e23e1fd30 ("[PATCH] non lazy "sleazy" fpu implementation")

However, switch_fpu_prepare() does this now so that else branch is
superfluous.

If we do use_eager_fpu(), then this has no effect. Otherwise, if we
actually wanted to prevent fpu preload after the context switch we would
need to reset it unconditionally, even if __thread_has_fpu().
---

?

Thanks.

---
e07e23e1fd300:

From: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
Date: Tue, 26 Sep 2006 10:52:36 +0200
Subject: [PATCH] [PATCH] non lazy "sleazy" fpu implementation

Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time). However for very frequent FPU users... you
take an extra trap every context switch.

The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the
trap is avoided. (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).

After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again). The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.

[akpm@xxxxxxxx: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
Signed-off-by: Andi Kleen <ak@xxxxxxx>
Cc: Andi Kleen <ak@xxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---
arch/x86_64/kernel/process.c | 10 ++++++++++
arch/x86_64/kernel/traps.c | 1 +
include/asm-x86_64/i387.h | 5 ++++-
include/linux/sched.h | 9 +++++++++
4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index 6fbd19564e4e..9e9a70e50c72 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -552,6 +552,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(init_tss, cpu);

+ /* we're going to use this soon, after a few expensive things */
+ if (next_p->fpu_counter>5)
+ prefetch(&next->i387.fxsave);
+
/*
* Reload esp0, LDT and the page table pointer:
*/
@@ -629,6 +633,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
|| test_tsk_thread_flag(prev_p, TIF_IO_BITMAP))
__switch_to_xtra(prev_p, next_p, tss);

+ /* If the task has used fpu the last 5 timeslices, just do a full
+ * restore of the math state immediately to avoid the trap; the
+ * chances of needing FPU soon are obviously high now
+ */
+ if (next_p->fpu_counter>5)
+ math_state_restore();
return prev_p;
}

diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index 28e53342f294..ffc40cff1e07 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -1136,6 +1136,7 @@ asmlinkage void math_state_restore(void)
init_fpu(me);
restore_fpu_checking(&me->thread.i387.fxsave);
task_thread_info(me)->status |= TS_USEDFPU;
+ me->fpu_counter++;
}

void __init trap_init(void)
diff --git a/include/asm-x86_64/i387.h b/include/asm-x86_64/i387.h
index cba8a3b0cded..60c0f4853fdb 100644
--- a/include/asm-x86_64/i387.h
+++ b/include/asm-x86_64/i387.h
@@ -24,6 +24,7 @@ extern unsigned int mxcsr_feature_mask;
extern void mxcsr_feature_mask_init(void);
extern void init_fpu(struct task_struct *child);
extern int save_i387(struct _fpstate __user *buf);
+extern asmlinkage void math_state_restore(void);

/*
* FPU lazy state save handling...
@@ -31,7 +32,9 @@ extern int save_i387(struct _fpstate __user *buf);

#define unlazy_fpu(tsk) do { \
if (task_thread_info(tsk)->status & TS_USEDFPU) \
- save_init_fpu(tsk); \
+ save_init_fpu(tsk); \
+ else \
+ tsk->fpu_counter = 0; \
} while (0)

/* Ignore delayed exceptions from user space */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 34ed0d99b1bd..807556c5bcd2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -865,6 +865,15 @@ struct task_struct {
struct key *thread_keyring; /* keyring private to this thread */
unsigned char jit_keyring; /* default keyring to attach requested keys to */
#endif
+ /*
+ * fpu_counter contains the number of consecutive context switches
+ * that the FPU is used. If this is over a threshold, the lazy fpu
+ * saving becomes unlazy to save the trap. This is an unsigned char
+ * so that after 256 times the counter wraps and the behavior turns
+ * lazy again; this to deal with bursty apps that only use FPU for
+ * a short time
+ */
+ unsigned char fpu_counter;
int oomkilladj; /* OOM kill score adjustment (bit shift). */
char comm[TASK_COMM_LEN]; /* executable name excluding path
- access with [gs]et_task_comm (which lock
--
2.2.0.33.gc18b867

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/