[PATCH v4 1/2] x86/fpu: track AVX-512 usage of tasks

From: Aubrey Li
Date: Tue Dec 11 2018 - 02:42:58 EST


User space tools which do automated task placement need information
about AVX-512 usage of tasks, because AVX-512 usage could cause core
turbo frequency drop and impact the running task on the sibling CPU.

The XSAVE hardware structure has bits that indicate when valid state
is present in registers unique to AVX-512 use. Use these bits to
indicate when AVX-512 has been in use and add per-task AVX-512 state
tracking to context switch.

The tracking turns on the usage flag at the next context switch of
the task, but requires 3 consecutive context switches with no usage
to clear it. This decay is required because well-written AVX-512
applications are expected to clear this state when not actively using
AVX-512 registers.

Although this mechanism is imprecise and can theoretically have both
false-positives and false-negatives, it has been measured to be precise
enough to be useful under real-world workloads like tensorflow and linpack.

If higher precision is required, suggest user space tools to use the
PMU-based mechanisms in combination.

Signed-off-by: Aubrey Li <aubrey.li@xxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
---
arch/x86/include/asm/fpu/internal.h | 22 ++++++++++++++++++++++
arch/x86/include/asm/fpu/types.h | 8 ++++++++
2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index a38bf5a1e37a..0da74d63ba14 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -275,6 +275,27 @@ static inline void copy_fxregs_to_kernel(struct fpu *fpu)
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")

+#define AVX512_STATE_DECAY_COUNT 3
+/*
+ * This function is called during context switch to update AVX512 state
+ */
+static inline void update_avx512_state(struct fpu *fpu)
+{
+ /*
+ * AVX512 state is tracked here because its use is known to slow
+ * the max clock speed of the core.
+ *
+ * However, AVX512-using tasks are expected to clear this state when
+ * not actively using these registers. Thus, this tracking mechanism
+ * can miss. To ensure that false-negatives do not immediately show
+ * up, decay the usage count over time.
+ */
+ if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+ fpu->avx512_usage = AVX512_STATE_DECAY_COUNT;
+ else if (fpu->avx512_usage)
+ fpu->avx512_usage--;
+}
+
/*
* This function is called only during boot time when x86 caps are not set
* up and alternative can not be used yet.
@@ -411,6 +432,7 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
{
if (likely(use_xsave())) {
copy_xregs_to_kernel(&fpu->state.xsave);
+ update_avx512_state(fpu);
return 1;
}

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecf..313b134d3ca3 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -302,6 +302,14 @@ struct fpu {
*/
unsigned char initialized;

+ /*
+ * @avx512_usage:
+ *
+ * Records the usage of AVX512 registers. A value of non-zero is used
+ * to indicate whether these AVX512 registers recently had valid state.
+ */
+ unsigned char avx512_usage;
+
/*
* @state:
*
--
2.17.1