Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter
From: Stephane Eranian
Date: Mon May 22 2017 - 14:15:28 EST
Hi,
On Mon, May 22, 2017 at 1:30 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, May 19, 2017 at 10:06:21AM -0700, kan.liang@xxxxxxxxx wrote:
>> From: Kan Liang <Kan.liang@xxxxxxxxx>
>>
>> The CPU ref_cycles can only be used by one user at the same time,
>> otherwise a "not counted" error will be displaced.
>> [kan]$ sudo perf stat -x, -e ref-cycles,ref-cycles -- sleep 1
>> 1203264,,ref-cycles,513112,100.00,,,,
>> <not counted>,,ref-cycles,0,0.00,,,,
>>
>> CPU ref_cycles can only be counted by fixed counter 2. It uses
>> pseudo-encoding. The GP counter doesn't recognize.
>>
>> BUS_CYCLES (0x013c) is another event which is not affected by core
>> frequency changes. It has a constant ratio with the CPU ref_cycles.
>> BUS_CYCLES could be used as an alternative event for ref_cycles on GP
>> counter.
>> A hook is implemented in x86_schedule_events. If the fixed counter 2 is
>> occupied and a GP counter is assigned, BUS_CYCLES is used to replace
>> ref_cycles. A new flag PERF_X86_EVENT_REF_CYCLES_REP in
>> hw_perf_event is introduced to indicate the replacement.
>> To make the switch transparent, counting and sampling are also specially
>> handled.
>> - For counting, it multiplies the result with the constant ratio after
>> reading it.
>> - For sampling with fixed period, the BUS_CYCLES period = ref_cycles
>> period / the constant ratio.
>> - For sampling with fixed frequency, the adaptive frequency algorithm
>> will figure it out on its own. Do nothing.
>>
>> The constant ratio is model specific.
>> For the model after NEHALEM but before Skylake, the ratio is defined in
>> MSR_PLATFORM_INFO.
>> For the model after Skylake, it can be get from CPUID.15H.
>> For Knights Landing, Goldmont and later, the ratio is always 1.
>>
>> The old Silvermont/Airmont, Core2 and Atom machines are not covered by
>> the patch. The behavior on those machines will not change.
>
> Maybe I missed it, but *why* are we doing this?
Yes, I would like to understand the motivation for this added
complexity as well.
My guess is that you have a situation where ref-cycles is used
constantly, i.e., pinned, and therefore you
lose the ability to count it for any other user. This is the case when
you switch the hard lockup detector
(NMI watchdog) to using ref-cycles instead of core cycles. This is
what you are doing in patch 2/2 actually.
Another scenario could be with virtual machines. KVM makes all guests
events use pinned events on the host. So if the guest is measuring
ref-cycles, then the host cannot.Well, I am hoping this is not the
case because as far as I remember system-wide pinned has
higher priority than per-process pinned.
You cannot make your change transparent in sampling mode. You are
adjusting the period with the ratio. If
the user asks for the period to be recorded in each sample, the
modified period will be captured. If I say I
want to sample every 1M ref-cycles and I set event_attr.sample_type =
PERF_SAMPLE_PERIOD, then I
expect to see 1M in each sample and not some scaled value. So you need
to address this problem, including
in frequency mode.