Re: [Patch v2 3/5] perf x86/topdown: Don't move topdown metrics events when sorting events

From: Ian Rogers
Date: Tue Jul 09 2024 - 18:37:35 EST


On Mon, Jul 8, 2024 at 9:18 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
>
>
> On 7/8/2024 11:08 PM, Ian Rogers wrote:
> > On Mon, Jul 8, 2024 at 12:40 AM Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
> >> when running below perf command, we say error is reported.
> >>
> >> perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1
> >>
> >> ------------------------------------------------------------
> >> perf_event_attr:
> >> type 4 (cpu)
> >> size 168
> >> config 0x400 (slots)
> >> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >> read_format ID|GROUP|LOST
> >> disabled 1
> >> sample_id_all 1
> >> exclude_guest 1
> >> ------------------------------------------------------------
> >> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
> >> ------------------------------------------------------------
> >> perf_event_attr:
> >> type 4 (cpu)
> >> size 168
> >> config 0x8000 (topdown-retiring)
> >> { sample_period, sample_freq } 4000
> >> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >> read_format ID|GROUP|LOST
> >> freq 1
> >> sample_id_all 1
> >> exclude_guest 1
> >> ------------------------------------------------------------
> >> sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8
> >> sys_perf_event_open failed, error -22
> >>
> >> Error:
> >> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
> >>
> >> The reason of error is that the events are regrouped and
> >> topdown-retiring event is moved to closely after the slots event and
> >> topdown-retiring event needs to do the sampling, but Intel PMU driver
> >> doesn't support to sample topdown metrics events.
> >>
> >> For topdown metrics events, it just requires to be in a group which has
> >> slots event as leader. It doesn't require topdown metrics event must be
> >> closely after slots event. Thus it's a overkill to move topdown metrics
> >> event closely after slots event in events regrouping and furtherly cause
> >> the above issue.
> >>
> >> Thus delete the code that moving topdown metrics events to fix the
> >> issue.
> > I think this is wrong. The topdown events may not be in a group, such
> > cases can come from metrics due to grouping constraints, and so they
> > must be sorted together so that they may be gathered into a group to
> > avoid the perf event opens failing for ungrouped topdown events. I'm
> > not understanding what these patches are trying to do, if you want to
> > prioritize the event for leader sampling why not modify it to compare
>
> Per my understanding, this change doesn't break anything. The events
> regrouping can be divided into below several cases.
>
> a. all events in a group
>
> perf stat -e "{instructions,topdown-retiring,slots}" -C0 sleep 1
> WARNING: events were regrouped to match PMUs
>
> Performance counter stats for 'CPU(s) 0':
>
> 15,066,240 slots
> 1,899,760 instructions
> 2,126,998 topdown-retiring
>
> 1.045783464 seconds time elapsed
>
> In this case, slots event would be adjusted as the leader event and all
> events are still in same group.
>
> b. all events not in a group
>
> perf stat -e "instructions,topdown-retiring,slots" -C0 sleep 1
> WARNING: events were regrouped to match PMUs
>
> Performance counter stats for 'CPU(s) 0':
>
> 2,045,561 instructions
> 17,108,370 slots
> 2,281,116 topdown-retiring
>
> 1.045639284 seconds time elapsed
>
> In this case, slots and topdown-retiring are placed into a group and slots
> is the group leader. instructions event is outside the group.
>
> c. slots event in group but topdown metric events outside the group
>
> perf stat -e "{instructions,slots},topdown-retiring" -C0 sleep 1
> WARNING: events were regrouped to match PMUs
>
> Performance counter stats for 'CPU(s) 0':
>
> 20,323,878 slots
> 2,634,884 instructions
> 3,028,656 topdown-retiring
>
> 1.045076380 seconds time elapsed
>
> In this case, topdown-retiring event is placed into previous group and
> slots is adjusted to leader event.
>
> d. multiple event groups
>
> perf stat -e "{instructions,slots},{topdown-retiring}" -C0 sleep 1
> WARNING: events were regrouped to match PMUs
>
> Performance counter stats for 'CPU(s) 0':
>
> 26,319,024 slots
> 2,427,791 instructions
> 2,683,508 topdown-retiring
>
> 1.045495830 seconds time elapsed
>
> In this case, the two groups are merged to one group and slots event is
> adjusted as leader.
>
> The key point of this patch is that it's unnecessary to move topdown
> metrics events closely after slots event. It's a overkill since Intel core
> PMU driver doesn't require that. Intel PMU driver just requires topdown
> metrics events are in a group where slots event is the group leader, and
> worse the movement for topdown metrics events causes the issue in the
> commit message mentioned.
>
> This patch doesn't block to regroup topdown metrics event. It just removes
> the unnecessary movement for topdown metrics events.

But you will get the same behavior because of the non-arch dependent
force group index - I guess you don't care as the sample read only
happens when you have a group.

I'm thinking of cases like (which admittedly is broken):
```
$ perf stat -e "{slots,instructions},cycles,topdown-fe-bound" -a sleep 0.1
[sudo] password for irogers:

Performance counter stats for 'system wide':

2,589,345,900 slots
852,492,838 instructions
583,525,372 cycles
<not supported> topdown-fe-bound

0.103930790 seconds time elapsed
```
As the slots event is grouped there's no force group index on it, we
want to shuffle the topdown-fe-bound into the group so we want it to
compare as less than cycles - ie we're comparing topdown events with
non topdown events and trying to shuffle the topdown events first.

Thanks,
Ian



>
> > first?
> >
> > Thanks,
> > Ian
> >
> >> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx>
> >> ---
> >> tools/perf/arch/x86/util/evlist.c | 5 -----
> >> 1 file changed, 5 deletions(-)
> >>
> >> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> >> index 332e8907f43e..6046981d61cf 100644
> >> --- a/tools/perf/arch/x86/util/evlist.c
> >> +++ b/tools/perf/arch/x86/util/evlist.c
> >> @@ -82,11 +82,6 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> >> return -1;
> >> if (arch_is_topdown_slots(rhs))
> >> return 1;
> >> - /* Followed by topdown events. */
> >> - if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >> - return -1;
> >> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >> - return 1;
> >> }
> >>
> >> /* Default ordering by insertion index. */
> >> --
> >> 2.40.1
> >>