Re: [REGRESSION] x86, perf: counter freezing breaks rr

From: Liang, Kan
Date: Thu Nov 29 2018 - 09:50:17 EST




On 11/27/2018 8:25 PM, Stephane Eranian wrote:
On Tue, Nov 27, 2018 at 3:36 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:

It does seem that FREEZE_PERFMON_ON_PMI (misnamed as it is) is of
rather limited use (or even negative, in our case) to a counter that's
already restricted to ring 3.

It's much faster. The PMI cost goes down dramatically.

I still the the right fix is to add an perf event opt-out and let it be
used by rr.

V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)

perf options ` V3(ns) V4(ns) delta
-c 100000 1088 894 -18%
-g -c 100000 1862 1646 -12%
--call-graph lbr -c 100000 3649 3367 -8%
--c.g. dwarf -c 100000 2248 1982 -12%

Is that measured on the same machine, i.e., do you force V3 on Skylake?

Yes, it's measured on same Kabylake machine with counter_freezing option disabled/enabled.


All it does, I think, is save one wrmsr(GLOBAL_CTLR) on entry to the
PMU interrupt handler or am I missing something?
Or does it save two? The wrmsr(GLOBAL_CTRL) at the end to reactivate.

__intel_pmu_disable_all() and __intel_pmu_enable_all() are not called in V4 handler. So save at least two wrmsrl.

Thanks,
Kan