Re: [PATCH 1/3] perf pmu: Limit PMU cpumask to online CPUs

From: Ian Rogers
Date: Fri Jun 07 2024 - 03:09:39 EST


On Thu, Jun 6, 2024 at 5:36 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Thu, Jun 6, 2024 at 5:09 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > On Mon, Jun 03, 2024 at 09:52:15AM -0700, Ian Rogers wrote:
> > > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
> > > >
> > > > From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
> > > >
> > > > We'll initialize the PMU's cpumask from "cpumask" or "cpus" sysfs
> > > > attributes if provided by the driver without checking the CPUs
> > > > are online or not. In such case that CPUs provided by the driver
> > > > contains the offline CPUs, we'll try to open event on the offline
> > > > CPUs and then rejected by the kernel:
> > > >
> > > > [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> > > > [root@localhost yang]# ./perf_static stat -e armv8_pmuv3_0/cycles/ --timeout 100
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> > > > /bin/dmesg | grep -i perf may provide additional information.
> > > >
> > > > So it's better to do a double check in the userspace and only include
> > > > the online CPUs from "cpumask" or "cpus" to avoid opening events on
> > > > offline CPUs.
> > >
> > > I see where you are coming from with this but I think it is wrong. The
> > > cpus for an uncore PMU are a hint of the CPU to open on rather than
> > > the set of valid CPUs. For example:
> > > ```
> > > $ cat /sys/devices/uncore_imc_free_running_0/cpumask
> > > 0
> > > $ perf stat -vv -e uncore_imc_free_running_0/data_read/ -C 1 -a sleep 0.1
> > > Using CPUID GenuineIntel-6-8D-1
> > > Attempt to add: uncore_imc_free_running_0/data_read/
> > > ..after resolving event: uncore_imc_free_running_0/event=0xff,umask=0x20/
> > > Control descriptor is not initialized
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > exclude_guest 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
> > > sys_perf_event_open failed, error -22
> > > switching off cloexec flag
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > exclude_guest 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0
> > > sys_perf_event_open failed, error -22
> > > switching off exclude_guest, exclude_host
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > > type 24 (uncore_imc_free_running_0)
> > > size 136
> > > config 0x20ff (data_read)
> > > sample_type IDENTIFIER
> > > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > disabled 1
> > > inherit 1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0 = 3
> > > uncore_imc_free_running_0/data_read/: 1: 4005984 102338957 102338957
> > > uncore_imc_free_running_0/data_read/: 4005984 102338957 102338957
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > 244.51 MiB uncore_imc_free_running_0/data_read/
> > >
> > > 0.102320376 seconds time elapsed
> > > ```
> > > So the CPU mask of the PMU says to open on CPU 0, but on the command
> > > line when I passed "-C 1" it opened it on CPU 1. If the cpumask file
> > > contained an offline CPU then this change would make it so the CPU map
> > > in the tool were empty, however, a different CPU may be programmable
> > > and online.
> >
> > I think Intel uncore PMU driver ignores the CPU parameter and set it to
> > CPU 0 in this case internally. See uncore_pmu_event_init() at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/uncore.c#n761
>
> Hmm.. maybe that's just the option if not set. Wrt hot plugging, on a
> 2 socket skylake:
> ```
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,18
> $ echo 0 > /sys/devices/system/cpu/cpu18/online
> $ cat /sys/devices/uncore_imc_0/cpumask
> 0,19
> ```
> So the cpumask should be reflecting the online/offline nature of CPUs.
>
> > >
> > > Fwiw, the tool will determine whether the mask is for all valid or a
> > > hint by using the notion of a PMU being "core" or not. That notion
> > > considers whether the mask was loading from a "cpumask" or "cpus"
> > > file:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n810
> > >
> > > Thanks,
> > > Ian
> > >
> > > > Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
> > > > ---
> > > > tools/perf/util/pmu.c | 13 +++++++++++--
> > > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > > index 888ce9912275..51e8d10ee28b 100644
> > > > --- a/tools/perf/util/pmu.c
> > > > +++ b/tools/perf/util/pmu.c
> > > > @@ -771,8 +771,17 @@ static struct perf_cpu_map *pmu_cpumask(int dirfd, const char *name, bool is_cor
> > > > continue;
> > > > cpus = perf_cpu_map__read(file);
> > > > fclose(file);
> > > > - if (cpus)
> > > > - return cpus;
> > > > + if (cpus) {
> > > > + struct perf_cpu_map *intersect __maybe_unused;
> > > > +
> > > > + if (perf_cpu_map__is_subset(cpu_map__online(), cpus))
> > > > + return cpus;
> > > > +
> > > > + intersect = perf_cpu_map__intersect(cpus, cpu_map__online());
> >
> > So IIUC this is for core PMUs with "cpus" file, right? I guess uncore
> > drivers already handles "cpumask" properly..
>
> So I think this is an ARM specific bug:
>
> Core PMUs:
> x86 uses /sys/devices/cpu, s390 uses cpum_cf, these lack a cpus or
> cpumask and so we default to opening events on all online processors.
> The fact these are core PMUs is hardcoded in the tool:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n1747
> x86 hybrid /sys/devices/cpu_(core|atom)/cpus - the set of CPUs is
> updated to reflect online and offline
> ARM the /sys/devices/armv8_pmuv3_0 isn't a hardcoded core PMU and so
> we expect the cpus to contain online CPUs, but it currently also
> erroneously contains offline ones.
>
> Uncore PMUs:
> x86 has /sys/devices/<uncore...>/cpumask where the cpumask reflects
> online and offline CPUs
> ARM has things like the dmc620 PMU, it appears to be missing cpumask
> in certain cases leading to the perf tool treating it like the x86
> core PMU and opening events for it on every CPU:
> ```
> # ls /sys/devices/arm_dmc620_10008c000/
> events format perf_event_mux_interval_ms power subsystem type uevent
> ```
>
> I think we need a PMU test so that bugs like this can be reported, but
> we may also need to add tool workarounds for PMUs that are broken. I
> can imagine it will be tricky to test uncore PMUs that default to CPU0
> as often we can't offline that.

I think we should make this an ARM specific fix up like:
https://lore.kernel.org/lkml/20240607065343.695369-1-irogers@xxxxxxxxxx/
The PMU needs fixing up like in the rest of the change, but as perf
can be run on older kernels I think this workaround will remain
necessary. The arm_cmn PMU appears to handle CPU hot plugging, the
arm_dmc620 lacks a cpumask altogether on the test machine I can
access. I suspect we may want a better uncore fix as we may change a
CPU map of 1 CPU into an empty CPU map, for example, if the cpumask is
"0" and CPU0 is offline, then "1" would be a better alternative than
the empty CPU map. I couldn't find a PMU to test this with.

Thanks,
Ian

> Thanks,
> Ian
>
> > Thanks,
> > Namhyung
> >
> >
> > > > + perf_cpu_map__put(cpus);
> > > > + if (intersect)
> > > > + return intersect;
> > > > + }
> > > > }
> > > >
> > > > /* Nothing found, for core PMUs assume this means all CPUs. */
> > > > --
> > > > 2.24.0
> > > >