Re: [PATCH v4 3/4] perf/core: Remove pmu linear searching code

From: Ian Rogers
Date: Wed May 31 2023 - 16:20:25 EST


On Wed, May 31, 2023 at 2:09 AM Thomas Richter <tmricht@xxxxxxxxxxxxx> wrote:
>
> On 5/30/23 16:00, Ian Rogers wrote:
> > ls /sys/devices/*/cpu*
>
> Hi Ian,
>
> here is the output yo requested:
>
> # ls /sys/devices/*/cpu*
> cpu0 cpu3 cpu6 hotplug modalias possible smt
> cpu1 cpu4 cpu7 isolated offline present uevent
> cpu2 cpu5 dispatching kernel_max online rescan vulnerabilities
> #
>
> In fact it is the same as
> # ls /sys/devices/system/cpu/
> cpu0 cpu3 cpu6 hotplug modalias possible smt
> cpu1 cpu4 cpu7 isolated offline present uevent
> cpu2 cpu5 dispatching kernel_max online rescan vulnerabilities
> #
> This directory tree has nothing to do with events for perf, it is
> merely used to support CPU hotplug on s390.
>
> The PMUs on s390 are
> # ls -ld /sys/devices/{cpum_,pai_}*
> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_cf
> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_cf_diag
> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/cpum_sf
> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/pai_crypto
> drwxr-xr-x 4 root root 0 May 25 15:45 /sys/devices/pai_ext
> #
>
> I hope his helps.

Thanks Thomas,

The perf tool has a notion of "core" and "other" PMUs - other means
uncore and things like interconnect PMUs. The distinction matters as
PMUs may have a list of CPUs with them, for "other" PMUs the CPU list
(known in the tool as the CPU map) is a suggestion of the CPU to open
events on. For example, on my laptop:
```
$ cat /sys/devices/system/cpu/online
0-15
$ cat /sys/devices/uncore_imc_free_running_0/cpumask
0
```
So I have 16 "CPUs" and the memory controller is suggesting opening
the events on CPU 0. However, if I do:
```
$ sudo perf stat -e 'uncore_imc_free_running_0/data_read/' -C 8 -a -A sleep 1

Performance counter stats for 'system wide':

CPU8 4,617.60 MiB
uncore_imc_free_running_0/data_read/


1.001094684 seconds time elapsed
```
Then things are good and the event was recorded on CPU 8, even though
the cpumask of the PMU only said CPU 0.

For "core" PMUs the CPU map works differently. A CPU outside of the
map is an error. For example, if I have a heterogeneous system with 2
CPUs, the first CPU on 1 PMU and the second CPU on a different PMU, if
I try to open events for the wrong PMU type for the CPU then it
should fail. The CPU map should be interpreted as the set of CPUs
events are valid on.

The logic to determine if a PMU is "core" or "other" is here:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1417
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n609

That is, a PMU is a "core" PMU if it is called "cpu" or if there is a
file called "cpus" inside the PMU's sysfs directory. It seems on s390
none of the PMUs are considered "core" and this is why we're having
issues.

Looking at:
https://lore.kernel.org/lkml/20180416132314.33249-1-tmricht@xxxxxxxxxxxxx/

It would seem to make sense that "cpu", "cpum_cf" and "cpum_sf" should
all cause "is_pmu_core" to return true, From your output I suspect the
issue is that cpum_cf and cpum_sf both lack the "cpus" file - which is
also true on homogeneous x86's "cpu" PMU. We can fix the code with:

```
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 0520aa9fe991..bdc3f3b148fc 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1414,7 +1414,9 @@ void perf_pmu__del_formats(struct list_head *formats)

bool is_pmu_core(const char *name)
{
- return !strcmp(name, "cpu") || is_sysfs_pmu_core(name);
+ return !strcmp(name, "cpu") ||
+ !strcmp(name, "cpum_cf") || !strcmp(name, "cpum_sf") ||
+ is_sysfs_pmu_core(name);
}

bool perf_pmu__supports_legacy_cache(const struct perf_pmu *pmu)
```

Wdyt? Thanks,
Ian

> --
> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
> --
> Vorsitzender des Aufsichtsrats: Gregor Pillen
> Geschäftsführung: David Faller
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
>