Re: [PATCH 4/9] perf/x86/intel: Support hardware TopDown metrics

From: Liang, Kan
Date: Tue May 28 2019 - 14:26:35 EST

On 5/28/2019 8:43 AM, Peter Zijlstra wrote:
On Tue, May 21, 2019 at 02:40:50PM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
The 8bit metrics ratio values lose precision when the measurement period
gets longer.

To avoid this we always reset the metric value when reading, as we
already accumulate the count in the perf count value.

For a long period read, low precision is acceptable.
For a short period read, the register will be reset often enough that it
is not a problem.

The PERF_METRICS may report wrong value if its delta was less than 1/255
of SLOTS (Fixed counter 3).

To avoid this, the PERF_METRICS and SLOTS registers have to be reset
simultaneously. The slots value has to be cached as well.

That doesn't sound like it is NMI-safe.

The TopDown can be collected per thread/process. To use TopDown
through RDPMC in applications on Icelake, the metrics and slots values
have to be saved/restored during context switching.

Add specific set_period() to specially handle the slots and metrics
event. Because,
- The initial value must be 0.
- Only need to restore the value in context switch. For other cases,
the counters have been cleared after read.

So the above claims to explain RDPMC, but doesn't mention that magic
value below at all. In fact, I don't see how the above relates to RDPMC
at all.

Current perf only support per-core Topdown RDPMC. On Icelake, it can be extended to per-thread Topdown RDPMC.
It tries to explain the extra work for per-thread topdown RDPMC, e.g. save/restore slots and metrics value in context switch.

@@ -2141,7 +2157,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
return 0;
- if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
+ if (is_metric_idx(idx))
+ idx = 1 << 29;

I can't find this in the SDM RDPMC description. What does it return?

It will return the value of PERF_METRICS. I will add it in the changelog.


+ else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
idx |= 1 << 30;