Re: [Patch v2 3/5] perf x86/topdown: Don't move topdown metrics events when sorting events

From: Ian Rogers
Date: Wed Jul 10 2024 - 11:07:49 EST


On Wed, Jul 10, 2024 at 2:40 AM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
>
>
> On 7/10/2024 6:37 AM, Ian Rogers wrote:
> > On Mon, Jul 8, 2024 at 9:18 PM Mi, Dapeng <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
> >>
> >> On 7/8/2024 11:08 PM, Ian Rogers wrote:
> >>> On Mon, Jul 8, 2024 at 12:40 AM Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx> wrote:
> >>>> when running below perf command, we say error is reported.
> >>>>
> >>>> perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1
> >>>>
> >>>> ------------------------------------------------------------
> >>>> perf_event_attr:
> >>>> type 4 (cpu)
> >>>> size 168
> >>>> config 0x400 (slots)
> >>>> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >>>> read_format ID|GROUP|LOST
> >>>> disabled 1
> >>>> sample_id_all 1
> >>>> exclude_guest 1
> >>>> ------------------------------------------------------------
> >>>> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
> >>>> ------------------------------------------------------------
> >>>> perf_event_attr:
> >>>> type 4 (cpu)
> >>>> size 168
> >>>> config 0x8000 (topdown-retiring)
> >>>> { sample_period, sample_freq } 4000
> >>>> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >>>> read_format ID|GROUP|LOST
> >>>> freq 1
> >>>> sample_id_all 1
> >>>> exclude_guest 1
> >>>> ------------------------------------------------------------
> >>>> sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8
> >>>> sys_perf_event_open failed, error -22
> >>>>
> >>>> Error:
> >>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
> >>>>
> >>>> The reason of error is that the events are regrouped and
> >>>> topdown-retiring event is moved to closely after the slots event and
> >>>> topdown-retiring event needs to do the sampling, but Intel PMU driver
> >>>> doesn't support to sample topdown metrics events.
> >>>>
> >>>> For topdown metrics events, it just requires to be in a group which has
> >>>> slots event as leader. It doesn't require topdown metrics event must be
> >>>> closely after slots event. Thus it's a overkill to move topdown metrics
> >>>> event closely after slots event in events regrouping and furtherly cause
> >>>> the above issue.
> >>>>
> >>>> Thus delete the code that moving topdown metrics events to fix the
> >>>> issue.
> >>> I think this is wrong. The topdown events may not be in a group, such
> >>> cases can come from metrics due to grouping constraints, and so they
> >>> must be sorted together so that they may be gathered into a group to
> >>> avoid the perf event opens failing for ungrouped topdown events. I'm
> >>> not understanding what these patches are trying to do, if you want to
> >>> prioritize the event for leader sampling why not modify it to compare
> >> Per my understanding, this change doesn't break anything. The events
> >> regrouping can be divided into below several cases.
> >>
> >> a. all events in a group
> >>
> >> perf stat -e "{instructions,topdown-retiring,slots}" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >> Performance counter stats for 'CPU(s) 0':
> >>
> >> 15,066,240 slots
> >> 1,899,760 instructions
> >> 2,126,998 topdown-retiring
> >>
> >> 1.045783464 seconds time elapsed
> >>
> >> In this case, slots event would be adjusted as the leader event and all
> >> events are still in same group.
> >>
> >> b. all events not in a group
> >>
> >> perf stat -e "instructions,topdown-retiring,slots" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >> Performance counter stats for 'CPU(s) 0':
> >>
> >> 2,045,561 instructions
> >> 17,108,370 slots
> >> 2,281,116 topdown-retiring
> >>
> >> 1.045639284 seconds time elapsed
> >>
> >> In this case, slots and topdown-retiring are placed into a group and slots
> >> is the group leader. instructions event is outside the group.
> >>
> >> c. slots event in group but topdown metric events outside the group
> >>
> >> perf stat -e "{instructions,slots},topdown-retiring" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >> Performance counter stats for 'CPU(s) 0':
> >>
> >> 20,323,878 slots
> >> 2,634,884 instructions
> >> 3,028,656 topdown-retiring
> >>
> >> 1.045076380 seconds time elapsed
> >>
> >> In this case, topdown-retiring event is placed into previous group and
> >> slots is adjusted to leader event.
> >>
> >> d. multiple event groups
> >>
> >> perf stat -e "{instructions,slots},{topdown-retiring}" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >> Performance counter stats for 'CPU(s) 0':
> >>
> >> 26,319,024 slots
> >> 2,427,791 instructions
> >> 2,683,508 topdown-retiring
> >>
> >> 1.045495830 seconds time elapsed
> >>
> >> In this case, the two groups are merged to one group and slots event is
> >> adjusted as leader.
> >>
> >> The key point of this patch is that it's unnecessary to move topdown
> >> metrics events closely after slots event. It's a overkill since Intel core
> >> PMU driver doesn't require that. Intel PMU driver just requires topdown
> >> metrics events are in a group where slots event is the group leader, and
> >> worse the movement for topdown metrics events causes the issue in the
> >> commit message mentioned.
> >>
> >> This patch doesn't block to regroup topdown metrics event. It just removes
> >> the unnecessary movement for topdown metrics events.
> > But you will get the same behavior because of the non-arch dependent
> > force group index - I guess you don't care as the sample read only
> > happens when you have a group.
> >
> > I'm thinking of cases like (which admittedly is broken):
> > ```
> > $ perf stat -e "{slots,instructions},cycles,topdown-fe-bound" -a sleep 0.1
> > [sudo] password for irogers:
> >
> > Performance counter stats for 'system wide':
> >
> > 2,589,345,900 slots
> > 852,492,838 instructions
> > 583,525,372 cycles
> > <not supported> topdown-fe-bound
> >
> > 0.103930790 seconds time elapsed
> > ```
>
> I run the upstream code (commit 73e931504f8e0d42978bfcda37b323dbbd1afc08)
> without this patchset, I see same issue.
>
> perf stat -e "{slots,instructions},cycles,topdown-fe-bound" -a sleep 0.1
>
> Performance counter stats for 'system wide':
>
> 262,448,922 slots
> 29,630,373 instructions
> 43,891,902 cycles
> <not supported> topdown-fe-bound
>
> 0.150369560 seconds time elapsed
>
> #perf -v
> perf version 6.10.rc6.g73e931504f8e
>
> This issue is not caused by this patchset.

I agree, but I think what is broken above was caused by the forced
grouping change that I did for Andi. The point of your change here is
to say that topdown events don't need to be moved while sorting, but
what should be happening here is just that. topdown-fe-bound should be
moved into the group with slots and instructions so it isn't "<not
supported>". So yes the current code has issues, but that's not the
same as saying we don't need to move topdown events, we do so that we
may group them better.

Thanks,
Ian

> > As the slots event is grouped there's no force group index on it, we
> > want to shuffle the topdown-fe-bound into the group so we want it to
> > compare as less than cycles - ie we're comparing topdown events with
> > non topdown events and trying to shuffle the topdown events first.
>
> Current evlist__cmp() won't really swap the order of cycles and
> topdown-fe-bound.
>
> if (lhs_sort_idx != rhs_sort_idx)
> return lhs_sort_idx - rhs_sort_idx;
>
> When comparing cycles and topdown-fe-bound events, lhs_sort_idx is 2 and
> rhs_sort_idx is 3, so the swap won't happen.
>
> So the event sequence after sorting is still "slots, instructions ,cycles,
> topdown-fe-bound". Both cycles and topdown-fe-bound events won't be placed
> into the group.
>
>
> >
> > Thanks,
> > Ian
> >
> >
> >
> >>> first?
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >>>> Signed-off-by: Dapeng Mi <dapeng1.mi@xxxxxxxxxxxxxxx>
> >>>> ---
> >>>> tools/perf/arch/x86/util/evlist.c | 5 -----
> >>>> 1 file changed, 5 deletions(-)
> >>>>
> >>>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> >>>> index 332e8907f43e..6046981d61cf 100644
> >>>> --- a/tools/perf/arch/x86/util/evlist.c
> >>>> +++ b/tools/perf/arch/x86/util/evlist.c
> >>>> @@ -82,11 +82,6 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> >>>> return -1;
> >>>> if (arch_is_topdown_slots(rhs))
> >>>> return 1;
> >>>> - /* Followed by topdown events. */
> >>>> - if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>>> - return -1;
> >>>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >>>> - return 1;
> >>>> }
> >>>>
> >>>> /* Default ordering by insertion index. */
> >>>> --
> >>>> 2.40.1
> >>>>