On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
Is that measured on the same machine, i.e., do you force V3 on Skylake?
It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
rather limited use (or even negative, in our case) to a counter that's
already restricted to ring 3.
It's much faster. The PMI cost goes down dramatically.
I still the the right fix is to add an perf event opt-out and let it be
used by rr.
V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)
perf options ` V3(ns) V4(ns) delta
-c 100000 1088 894 -18%
-g -c 100000 1862 1646 -12%
--call-graph lbr -c 100000 3649 3367 -8%
--c.g. dwarf -c 100000 2248 1982 -12%
All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the
PMU interrupt handler or am I missing something?
Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate.