Re: [PATCH v1] perf evlist: Force adding default events only to core PMUs

From: Arnaldo Carvalho de Melo
Date: Thu Jun 06 2024 - 09:47:17 EST


On Wed, Jun 05, 2024 at 01:29:02PM -0700, Namhyung Kim wrote:
> On Thu, May 30, 2024 at 3:52 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > On Thu, May 30, 2024 at 06:46:08AM -0700, Ian Rogers wrote:
> > > On Thu, May 30, 2024 at 5:48 AM James Clark <james.clark@xxxxxxx> wrote:
> > > > On 30/05/2024 06:35, Namhyung Kim wrote:
> > > > > It might not be a perfect solution but it could be a simple one.
> > > > > Ideally I think it'd be nice if the kernel exports more information
> > > > > about the PMUs like sampling and exclude capabilities.

> > > > That seems like a much better suggestion. Especially with the ever
> > > > expanding retry/fallback mechanism that can never really take into
> > > > account every combination of event attributes that can fail.

> > > I think this approach can work but we may break PMUs.

> > > Rather than use `is_core` on `struct pmu` we could have say a
> > > `supports_sampling` and we pass to parse_events an option to exclude
> > > any PMU that doesn't have that flag. Now obviously more than just core
> > > PMUs support sampling. All software PMUs, tracepoints, probes. We have
> > > an imprecise list of these in perf_pmu__is_software. So we can set
> > > supports_sampling for perf_pmu__is_software and is_core.

> > Yep, we can do that if the kernel provides the info. But before that
> > I think it's practical to skip uncore PMUs and hope other PMUs don't
> > have event aliases clashing with the legacy names. :)

> > > I think the problem comes for things like the AMD IBS PMUs, intel_bts
> > > and intel_pt. Often these only support sampling but aren't core. There
> > > may be IBM S390 PMUs or other vendor PMUs that are similar. If we can
> > > make a list of all these PMU names then we can use that to set
> > > supports_sampling and not break event parsing for these PMUs.

> > > The name list sounds somewhat impractical, let's say we lazily compute
> > > the supports_sampling on a PMU. We need the sampling equivalent of
> > > is_event_supported:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/print-events.c?h=perf-tools-next#n242
> > > is_event_supported has had bugs, look at the exclude_guest workaround
> > > for Apple PMUs. It also isn't clear to me how we choose the event
> > > config that we're going to probe to determine whether sampling works.
> > > The perf_event_open may reject the test because of a bad config and
> > > not because sampling isn't supported.

> > > So I think we can make the approach work if we had either:
> > > 1) a list of PMUs that support sampling,
> > > 2) a reliable "is_sampling_supported" test.

> > > I'm not sure of the advantages of doing (2) rather than just creating
> > > the set of evsels and ignoring those that fail to open. Ignoring
> > > evsels that fail to open seems more unlikely to break anything as the
> > > user is giving the events/config values for the PMUs they care about.

> > Yep, that's also possible. I'm ok if you want to go that direction.

> Hmm.. I thought about this again. But it can be a problem if we ignore
> any failures as it can be a real error due to other reason - e.g. not
> supported configuration or other user mistakes.

Well, we should not ignore any failures, just look at
evsel__open_strerror(), we get some error from the kernel, we know we're
doing something, we put it into context, and then we try to provide the
most helpful message to the user.

The question here is how to figure out if what we want to do (sample)
makes sense for this event that has the name we used, since we're using
a "wildcard", a "well known event name" that should be translated to
events in all PMUs where it "matches", if those actual per-PMU events
can provide what the user is asking.

My suggestion about using pr_debug() is a compromise, one that will not
show warnings all the time for such a common case (cycles) while
allowing the use to follow a common, in tools/perf, way of diagnosing
when things don't look sane, which is to ask for verbose mode.

The best thing would be for us to be able to query, using existing
facilities, if what we want is possible, i.e. interpret the error coming
from the kernel in the context of what we are asking.

If this is not possible because the error is too generic (EINVAL) and
can map to multiple other things, then we can try to have a new kernel
facility for us to get this info in a authoritative way and use a
pr_warning() knowing that that warning will go away as the kernel is
updated.

Or we could use a big hammer, special handling some PMUs/events we know
can't sample and avoiding those when they fail, ugly, but should work,
and this is just for this "well known event names" people have been
using since forever. Those wanting specific events to be used should
know better and specify it precisely in the command line.

Does this sum up all the discussion so far?

Cheers,

- Arnaldo