RE: [PATCH V2 1/1] perf/x86: Add Intel power cstate PMUs support

From: Liang, Kan
Date: Thu Aug 06 2015 - 19:40:33 EST




> >> On Thu, Aug 6, 2015 at 1:25 PM, Liang, Kan <kan.liang@xxxxxxxxx> wrote:
> >> >
> >> >> >> >> >> +static cpumask_t power_cstate_core_cpu_mask;
> >> >> >> >> >
> >> >> >> >> > That one typically does not need a cpumask.
> >> >> >> >> >
> >> >> >> >> You need to pick one CPU out of the multi-core. But it is
> >> >> >> >> for client parts thus there is only one socket. At least
> >> >> >> >> this is my
> >> >> understanding.
> >> >> >> >>
> >> >> >> >
> >> >> >> > CORE_C*_RESIDENCY are available for physical processor core.
> >> >> >> > So logical processor in same physical processor core share
> >> >> >> > the same counter.
> >> >> >> > I think we need the cpumask to identify the default logical
> >> >> >> > processor which do counting.
> >> >> >> >
> >> >> >> Did you restrict these events to system-wide mode only?
> >> >> >>
> >> >> Ok, so that means that your cpumask includes one HT per physical
> core.
> >> >> But then, the result is not the simple aggregation of all the N/2 CPUs.
> >> >
> >> > The counter counts per physical core. The result is the aggregation
> >> > of all HT cpus in same physical core.
> >>
> >> But then don't you need to divide by 2 to get a meaningful result?
> >
> > Rethink of it. I think I was unclear about the aggregation of all HT
> > cpus in same physical core.
> >
> > physical core Cstate should equal to min(logical core C-state).
> > So only all logical core enters C6-state, the physical core enters
> > C6-state, then CORE_C6_RESIDENCY counts.
> >
> > So if we only count on one logical core/HT for CORE_C6_RESIDENCY.
> > We don't need to divide by 2. The count result is the residency when
> > all logical core in C6 (some may deeper).
> >
> Ok and here you are assuming you are only measuring one logical CPU per
> physical core. If this is the case, then I think you are alright. But I wonder
> what you'd get when perf stat -a aggregates across all measured CPUs, i.e.,
> one CPU per core.

Just add them all together.
I think we do the same thing for other PMUs as well.
For uncore or rapl, we get meaningful result by applying --per-socket.
Here we can use --per-core.

Thanks,
Kan