Re: [PATCH v3 09/35] perf evlist: Propagate user CPU maps intersecting core PMU maps

From: Ian Rogers
Date: Fri May 26 2023 - 17:41:15 EST


On Wed, May 24, 2023 at 10:30 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> Hi Ian,
>
> On Wed, May 24, 2023 at 3:19 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > The CPU map for a non-core PMU gives a default CPU value for
> > perf_event_open. For core PMUs the CPU map lists all CPUs the evsel
> > may be opened on. If there are >1 core PMU, the CPU maps will list the
> > CPUs for that core PMU, but the user_requested_cpus may contain CPUs
> > that are invalid for the PMU and cause perf_event_open to fail. To
> > avoid this, when propagating the CPU map for core PMUs intersect it
> > with the CPU map of the PMU (the evsel's "own_cpus").
> >
> > Add comments to __perf_evlist__propagate_maps to explain its somewhat
> > complex behavior.
>
> Thanks for tackling this. There are many assumptions on this code
> which make this code hard to understand. I think we need to list
> all possible cases and make the logic as simple as possible.
>
> >
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > ---
> > tools/lib/perf/evlist.c | 25 ++++++++++++++++++++-----
> > 1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 81e8b5fcd8ba..b8b066d0dc5e 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -36,18 +36,33 @@ void perf_evlist__init(struct perf_evlist *evlist)
> > static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> > struct perf_evsel *evsel)
> > {
> > - /*
> > - * We already have cpus for evsel (via PMU sysfs) so
> > - * keep it, if there's no target cpu list defined.
> > - */
>
> So basically this code is only needed when the user specified a cpu list.
> Otherwise evsels can use their own cpus. But it's a kind of sad that
> libperf does not have a notion of PMU (with a cpu map) yet.
>
> I think we have the following cases. Please tell me if I miss some.
>
> 1. non-hybrid core PMU: It used to not have a cpu map, but you added it
> in this patchset to cover all (online) CPUs. So it'd be ok to treat them as
> same as the hybrid PMUs.
>
> 2. hybrid core PMU: It has a cpu map to cover possible CPUs and the
> user requested cpu map should be intersected with its map.

Right, and these two cases should be just considered as core PMU
cases, where there can be >1 core PMU.

> 3. uncore PMU: It has a cpu map to indicate CPUs to handle event
> settings but it's allowed to read the event from other CPUs (at least
> for Intel CPUs). That means it can just use the user request cpu map.

Yep, but uncore can be a confusing name as it means things not in the
core and doesn't include, say the interconnect, that Intel calls
offcore. In pmus.c at the end of this series we have a list of core
pmus and other pmus.

> 4. dummy event: It can be marked as system-wide to get the sideband
> events from all CPUs. Then it should ignore the user requested cpu
> map. Otherwise it should be treated as other events.

Agreed. Typically dummy is regarded as a software event but the PMU
for software events has an empty CPU map.

> 5. tool event: It's used for perf stat and has a hardcoded cpu map for
> CPU 0. Not sure if it can accept other CPUs but it seems we can ignore
> the user requested cpu map.

Tool events have their PMU type set to software but then we special
case things prior to say displaying the name or reading a counter. The
CPU maps are never used to my knowledge and the enable/running times
look questionable for user and system time.

> 6. other event: No restrictions. It can use the user requested cpu map.

Here there are software, tracepoint and breakpoint events and there is
no PMU provided CPU map. There are sysfs PMUs for these but they don't
provide a CPU map.

I think the thing that stems from this is the comment on evsel
system_wide is stale:

/*
* system_wide is for events that need to be on every CPU, irrespective
* of user requested CPUs or threads. Map propagation will set cpus to
* this event's own_cpus, whereby they will contribute to evlist
* all_cpus.
*/

If this were true then the empty software PMU's CPU map would be
copied to dummy events when instead every CPU is being requested. I'll
tweak the comment in v4.

Thanks,
Ian


> > if (evsel->system_wide) {
> > + /* System wide: set the cpu map of the evsel to all online CPUs. */
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__new(NULL);
> > + } else if (evlist->has_user_cpus && evsel->is_pmu_core) {
> > + /*
> > + * User requested CPUs on a core PMU, ensure the requested CPUs
> > + * are valid by intersecting with those of the PMU.
> > + */
> > + perf_cpu_map__put(evsel->cpus);
> > + evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->own_cpus);
> > } else if (!evsel->own_cpus || evlist->has_user_cpus ||
> > - (!evsel->requires_cpu && perf_cpu_map__empty(evlist->user_requested_cpus))) {
> > + (!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
> > + /*
> > + * The PMU didn't specify a default cpu map, this isn't a core
> > + * event and the user requested CPUs or the evlist user
> > + * requested CPUs have the "any CPU" (aka dummy) CPU value. In
> > + * which case use the user requested CPUs rather than the PMU
> > + * ones.
> > + */
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> > } else if (evsel->cpus != evsel->own_cpus) {
> > + /*
> > + * No user requested cpu map but the PMU cpu map doesn't match
> > + * the evsel's. Reset it back to the PMU cpu map.
> > + */
>
> Not sure if it actually happens.
>
> Thanks,
> Namhyung
>
>
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> > }
> > --
> > 2.40.1.698.g37aff9b760-goog
> >