Re: [PATCH v6 1/2] perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu

From: James Clark
Date: Wed Jun 12 2024 - 09:13:03 EST




On 12/06/2024 13:32, Ian Rogers wrote:
> On Wed, Jun 12, 2024 at 4:19 AM Aishwarya TCV <aishwarya.tcv@xxxxxxx> wrote:
>>
>>
>>
>> On 15/05/2024 07:01, Ian Rogers wrote:
>>> The mrvl_ddr_pmu is uncore and has a hexadecimal address suffix while
>>> the previous PMU sorting/merging code assumes uncore PMU names start
>>> with uncore_ and have a decimal suffix. Because of the previous
>>> assumption it isn't possible to wildcard the mrvl_ddr_pmu.
>>>
>>> Modify pmu_name_len_no_suffix but also remove the suffix number out
>>> argument, this is because we don't know if a suffix number of say 100
>>> is in hexadecimal or decimal. As the only use of the suffix number is
>>> in comparisons, it is safe there to compare the values as hexadecimal.
>>> Modify perf_pmu__match_ignoring_suffix so that hexadecimal suffixes
>>> are ignored.
>>>
>>> Only allow hexadecimal suffixes to be greater than length 2 (ie 3 or
>>> more) so that S390's cpum_cf PMU doesn't lose its suffix.
>>>
>>> Change the return type of pmu_name_len_no_suffix to size_t to
>>> workaround GCC incorrectly determining the result could be negative.
>>>
>>> Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
>>> ---
>>> tools/perf/util/pmu.c | 33 +++++++++++++--------
>>> tools/perf/util/pmus.c | 67 ++++++++++++++++++++++++------------------
>>> tools/perf/util/pmus.h | 7 ++++-
>>> 3 files changed, 65 insertions(+), 42 deletions(-)
>>>
>>
>> Hi Ian,
>>
>> Perf test "perf_all_PMU_test" is failing when run against
>> next-master(next-20240612) kernel with Arm64 on JUNO in our CI. It looks
>> like it is failing when run on JUNO alone. Verified by running on other
>> boards like RB5 and Ampere_altra and confirming that it does not fail on
>> these boards. Suspecting that the suffixed 'armv8_pmuv3_0' naming could
>> be the reason of test failure.
>>
>> Reverting the change (3241d46f5f54) seems to fix it.
>>
>> This works fine on Linux version v6.10-rc3
>>
>> Failure log
>> ------------
>> 110: perf all PMU test:
>> --- start ---
>> test child forked, pid 8279
>> Testing armv8_pmuv3/br_immed_retired/
>> Event 'armv8_pmuv3/br_immed_retired/' not printed in:
>> # Running 'internals/synthesize' benchmark:
>> Computing performance of single threaded perf event synthesis by
>> synthesizing events on the perf process itself:
>> Average synthesis took: 1169.431 usec (+- 0.144 usec)
>> Average num. events: 35.000 (+- 0.000)
>> Average time per event 33.412 usec
>> Average data synthesis took: 1225.698 usec (+- 0.102 usec)
>> Average num. events: 119.000 (+- 0.000)
>> Average time per event 10.300 usec
>>
>> Performance counter stats for 'perf bench internals synthesize':
>>
>> 3263664785 armv8_pmuv3_0/br_immed_retired/
>>
>>
>> 25.472854464 seconds time elapsed
>>
>> 8.004791000 seconds user
>> 17.060209000 seconds sys
>> ---- end(-1) ----
>> 110: perf all PMU test :
>> FAILED!
>
> Hi Aishwarya,
>
> Thanks for reporting an issue. The test should be pretty self
> explanatory: it is doing a `perf stat -e
> armv8_pmuv3/br_immed_retired/` and then looking for that in the
> output. The event armv8_pmuv3/br_immed_retired/ comes from running
> perf list. As you can see in the output the event did work, so perf
> stat is working so nothing is actually broken here. What isn't working
> is the perf stat output matching the command line event and this is
> because of the unnecessary suffix on ARM's PMU name.
>
> We have a problem that ARM have buggy PMU drivers, either from
> introducing new naming conventions or by just being broken:
> https://lore.kernel.org/lkml/CAP-5=fWNDkOpnYF=5v1aQkVDrDWsmw+zYX1pjS8hoiYVgZsRGA@xxxxxxxxxxxxxx/
> I've also asked that ARM step up their testing, for example in the
> event parsing testing the PMU is hardcoded to the x86 PMU name of
> 'cpu':
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/parse-events.c?h=perf-tools-next#n2317
> On a cortex A53, then PMU is named 'armv8_cortex_a53':
> ```
> $ ls /sys/devices/armv8_cortex_a53/
> caps cpus events format perf_event_mux_interval_ms power
> subsystem type uevent
> ```
> This name appears better, so what's up with ARM's core PMU name?
> Anyway, I'm tempted to fix this by just skipping the test on ARM given
> ARM's overall broken state.
>

No need to skip the test, I'll take this one [1]. I think we're on the
same page that it's probably a test or cosmetic issue rather than
anything being broken in Perf.

[1]:
https://lore.kernel.org/linux-perf-users/8112b2bf-3950-4889-96bf-2317fbfb4bee@xxxxxxx/

> Further, can we escalate matters? It isn't right that these ARM issues
> are coming back to me. ARM aren't stepping up to fix not just the 2

I wouldn't read into it too much, I think this is just a courtesy email
pointing out that we're aware of an issue and tracked it back to a
particular commit. I don't think there is any expectation that it has to
be you that makes the fix and I'm happy to look into it.

> issues above but:
> 1) the broken Apple M? issue (ARM asked I fix this as it regressed due
> to my fixes for the Intel hybrid code):
> https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@xxxxxxxxxx/
> 2) opening events on BIG.little doesn't open the event on all core PMU types:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@xxxxxxxxxx/

As I mentioned in that thread I'll fix that one too at some point, I'm
just finishing off some other stuff first.

> 3) the broken 'cycles' event name in the arm_dsu PMU:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_dsu_pmu.c#n177
> 4) the ARM memory controller PMUs lacking cpumask causing events
> opened on them to be opened on every CPU and thereby induce
> multiplexing..
> and so on..
>
> Thanks,
> Ian
>
>> Thanks,
>> Aishwarya