Re: [GIT PULL] perf tools changes for v6.10

From: Ian Rogers
Date: Sat May 25 2024 - 11:33:10 EST


On Sat, May 25, 2024 at 7:09 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Fri, May 24, 2024 at 9:20 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, 24 May 2024 at 20:48, Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > >
> > > Thanks for the report. TL;DR try putting the PMU name with the event
> > > name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
> > > armv8_pmuv3_0 is the name of the PMU from /sys/devices.
> >
> > What? No.
> >
> > If that is the new rule, then I'm just going to revert. I'm not going
> > to use some random different PMU names across architectures when all I
> > want is just "cycles".
>
> The issue is that perf previously would do two things, if you asked
> for a 'cycles' event then as the name was a legacy one it would encode
> the type in the perf_event_attr as PERF_TYPE_HARDWARE and the config
> as PERF_COUNT_HW_CPU_CYCLES job done. With BIG.little/hybrid/.. those
> events now need opening on multiple PMUs otherwise you end up only
> sampling on either the big or the little core. On Intel with hybrid,
> cycles had to become cpu_core/cycles/ and cpu_atom/cycles/.
>
> If we look at an event name that isn't a legacy one, like
> inst_retired.any, then when no PMU is specified perf's behavior is to
> try to open this event on every PMU it can. On Intel this event will
> be found on the 'cpu' PMU, or on a hybrid Intel on cpu_core and
> cpu_atom. Were the event 'data_read' then it could be found on the
> uncore_imc_free_running_0 and uncore_imc_free_running_1 PMUs. My point
> here is that wildcarding an event to every PMU that supports it is
> arguably a feature as it avoids users needing to remember a specific
> PMU name. PMU names are often duplicated and it would be laborious to
> name them all.
>
> > In fact, the "cycles:pp" is a complete red herring. Just doing a simple
> >
> > $ perf record sleep 1
> >
> > with no explicit expression at all, results in that same
> >
> > Error:
> > cycles:P: PMU Hardware doesn't support sampling/overflow-interrupts.
> > Try 'perf stat'
> >
> > because perf *ITSELF* uses the sane format by default.
> >
> > So ABSOLUTELY NO. We're not changing the rules to "You have to know
> > some idiotic per-architecture JSON magic".
> >
> > I don't want any excuses like this. No "You are holding it wrong".
> > This is a BUG. Treat it as such.
> >
> > And yes, "perf record -vv" shows that it apparently picked some insane
> > arm_dsu_0 event. Which just shows that trying to parse the JSON rules
> > instead of just having sane defaults is clearly not working.
> >
> > So here's the deal: I will revert your commit tomorrow unless you show
> > that you are taking it seriously and have a sane fix.
> >
> > Because no, "You are holding it wrong" is not the answer. Never was,
> > never will be. Things used to work, they don't work any more. That
> > shit is not acceptable.
>
> I wasn't trying to say you were holding it wrong, I was trying to give
> you practical advice that would solve your problem whilst we determine
> what the right fix is.
>
> The question is what to do about events when no PMU is specified, here
> are the alternatives I see:
>
> 1) make legacy event names special and only open them on core PMUs,
> 2) make sure PMUs, like arm_dsu_0 in your case, don't advertise events
> matching legacy names like cycles,
> 3) make perf record only scan PMUs that support sampling,
> 4) in the case that we're using cycles:P as a default event, only open
> that on core PMUs (on Intel make cycles:P ->
> cpu_core/cycles/P,cpu_atom/cycles/P).
>
> A revert achieves (1) and I consider it a regression as now the
> behavior of 'cycles' doesn't match 'inst_retired.any' and the user
> somehow needs to know that certain event names are special.
> Changing the driver (2) is a workaround, but it feels like a burden on
> the driver writers.
> In order to determine whether a PMU supports sampling (3) we'd need to
> try probe using perf_event_open. The perf_event_open may fail due to
> things like host/guest permissions, etc. so we need fallbacks. We do
> something similar here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/print-events.c#n242
> (4) is the simplest change and I think it lowers the surprise from the
> behavior change in the patch. It means that were you to do "perf
> record -e cycles ..." you would still need to specify a PMU on Linus'
> special ARM box. We may lack permissions to read sysfs and so then it
> is hard to determine the PMU name, but the code was likely to fail
> then anyway.
>
> I'll write a patch to do (4), hopefully this qualifies as being
> serious, but it would be useful if the interested parties could work
> out what they think the behavior should be.

Patch sent:
https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@xxxxxxxxxx/

Thanks,
Ian

> Thanks,
> Ian
>
> > Linus
> >