From: Yi Sun
Sent: 23 July 2022 09:38...
Calculate the latency of instructions xsave and xrstor with new trace
points x86_fpu_latency_xsave and x86_fpu_latency_xrstor.
The delta TSC can be calculated within a single trace event. Another
option considered was to have 2 separated trace events marking the
start and finish of the xsave/xrstor instructions. The delta TSC was
calculated from the 2 trace points in user space, but there was
significant overhead added by the trace function itself.
In internal testing, the single trace point option which is
implemented here proved to be more accurate.
I've done some experiments that measure short instruction latencies.
Basically I found:
1) You need a suitable serialising instruction before and after
the code being tested - otherwise it can overlap whatever
you are using for timing.