Re: [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus

From: Adrian Hunter
Date: Thu Apr 28 2022 - 16:16:07 EST


On 8/04/22 06:56, Ian Rogers wrote:
> If all_cpus is calculated it represents the merge/union of all
> evsel cpu maps. By default user_requested_cpus is computed to be
> the online CPUs. For uncore events, it is often the case currently
> that all_cpus is a subset of user_requested_cpus. Metrics printed
> without aggregation and with metric-only, in print_no_aggr_metric,
> iterate over user_requested_cpus assuming every CPU has a metric to
> print. For each CPU the prefix is printed, but then if the
> evsel's cpus doesn't contain anything you get an empty line like
> the following on a 2 socket 36 core SkylakeX:
>
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
> 1.000453137 CPU0 0.00
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137 CPU18 0.00
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 1.000453137
> 2.003717143 CPU0 0.00
> ...
> ```
>
> While it is possible to be lazier in printing the prefix and
> trailing newline, having user_requested_cpus not be a subset of
> all_cpus is preferential so that wasted work isn't done elsewhere
> user_requested_cpus is used. The change modifies user_requested_cpus
> to be the intersection of user specified CPUs, or default all online
> CPUs, with the CPUs computed through the merge of all evsel cpu maps.
>
> New behavior:
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
> 1.001086325 CPU0 0.00
> 1.001086325 CPU18 0.00
> 2.003671291 CPU0 0.00
> 2.003671291 CPU18 0.00
> ...
> ```
>
> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> ---
> tools/perf/util/evlist.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 52ea004ba01e..196d57b905a0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
> if (!cpus)
> goto out_delete_threads;
>
> + if (evlist->core.all_cpus) {
> + struct perf_cpu_map *tmp;
> +
> + tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);

Isn't an uncore PMU represented as being on CPU0 actually
collecting data that can be due to any CPU.

Or for an uncore PMU represented as being on CPU0-CPU4 on a
4 core 8 hyperthread processor, actually 1 PMU per core.

So I am not sure intersection makes sense.

Also it is not obvious what happens with hybrid CPUs or
per thread recording.

> + perf_cpu_map__put(cpus);
> + cpus = tmp;
> + }
> evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
>
> perf_evlist__set_maps(&evlist->core, cpus, threads);