Re: [PATCH v6 1/3] x86/fpu: Measure the Latency of XSAVES and XRSTORS

From: Yi Sun
Date: Wed Sep 06 2023 - 04:48:07 EST


On 02.09.2023 12:49, Ingo Molnar wrote:

* Yi Sun <yi.sun@xxxxxxxxx> wrote:

+#define XSTATE_XSAVE(fps, lmask, hmask, err) \
+ do { \
+ struct fpstate *f = fps; \
+ u64 tc = -1; \
+ if (xsave_tracing_enabled()) \
+ tc = trace_clock(); \
+ __XSTATE_XSAVE(&f->regs.xsave, lmask, hmask, err); \
+ if (xsave_tracing_enabled()) \
+ trace_x86_fpu_latency_xsave(f, trace_clock() - tc);\
+ } while (0)
+
/*
* Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
* XSAVE area format.
*/
-#define XSTATE_XRESTORE(st, lmask, hmask) \
+#define __XSTATE_XRESTORE(st, lmask, hmask) \
asm volatile(ALTERNATIVE(XRSTOR, \
XRSTORS, X86_FEATURE_XSAVES) \
"\n" \
@@ -140,6 +168,17 @@ static inline u64 xfeatures_mask_independent(void)
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")

+#define XSTATE_XRESTORE(fps, lmask, hmask) \
+ do { \
+ struct fpstate *f = fps; \
+ u64 tc = -1; \
+ if (xrstor_tracing_enabled()) \
+ tc = trace_clock(); \
+ __XSTATE_XRESTORE(&f->regs.xsave, lmask, hmask); \
+ if (xrstor_tracing_enabled()) \
+ trace_x86_fpu_latency_xrstor(f, trace_clock() - tc);\
+ } while (0)
+
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
#else
@@ -184,7 +223,7 @@ static inline void os_xsave(struct fpstate *fpstate)
WARN_ON_FPU(!alternatives_patched);
xfd_validate_state(fpstate, mask, false);

- XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
+ XSTATE_XSAVE(fpstate, lmask, hmask, err);

/* We should never fault when copying to a kernel buffer: */
WARN_ON_FPU(err);
@@ -201,7 +240,7 @@ static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
u32 hmask = mask >> 32;

xfd_validate_state(fpstate, mask, true);
- XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
+ XSTATE_XRESTORE(fpstate, lmask, hmask);
}

Instead of adding overhead to the regular FPU context saving/restoring code
paths, could you add a helper function that has tracing code included, but
which isn't otherwise used - and leave the regular code with no tracing
overhead?

This puts a bit of a long-term maintenance focus on making sure that the
traced functionality won't bitrot, but I'd say that's preferable to adding
tracing overhead.

Hi Ingo,
It is actually a helper function, and I have renamed the original function
to __XSTATE_XSAVE. This function is only used in this particular context.

Furthermore, according doc static-keys.txt, the condition
xrstor_tracing_enabled() would introduce only a minimal overhead when the
trace is disabled. I believe it is a negligible impact on the performance
when the trace is disabled.

Please let me know if you have any questions or concerns about this function.

Thanks
--Sun, Yi