Re: [PATCH 0/3] Perf avoid opening events on offline CPUs

From: Ian Rogers
Date: Thu Jun 06 2024 - 03:05:14 EST


On Tue, Jun 4, 2024 at 1:04 AM Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
>
> On 2024/6/4 0:42, Ian Rogers wrote:
> > On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
> >>
> >> From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
> >>
> >> If user doesn't specify the CPUs, perf will try to open events on CPUs
> >> of the PMU which is initialized from the PMU's "cpumask" or "cpus" sysfs
> >> attributes if provided. But we doesn't check whether the CPUs provided
> >> by the PMU are all online. So we may open events on offline CPUs if PMU
> >> driver provide offline CPUs and then we'll be rejected by the kernel:
> >>
> >> [root@localhost yang]# echo 0 > /sys/devices/system/cpu/cpu0/online
> >
> > Generally Linux won't let you take CPU0 off line, I'm not able to
> > follow this step on x86 Linux. Fwiw, I routinely run perf with the
> > core hyperthread siblings offline.
> >
>
> It doesn't matter if it's the CPU0 offline or other CPUs. There's no restriction
> for CPU0 can go offline or not on arm64 and I just use this for example.
>
> I cannot reproduce it on x86. I think it may because we're initializing the
> pmu->cpus in different routines in pmu_cpumask(). There's no "cpus"
> for x86's core pmu on my x86 machine:
> root@ubuntu204:~# ls /sys/bus/event_source/devices/cpu/
> allow_tsx_force_abort caps events format freeze_on_smi perf_event_mux_interval_ms power rdpmc subsystem type uevent
>
> So pmu_cpumask() will infer it as an core pmu and initialize the cpus
> with online CPUs [1]. For arm64 there lies a "cpus" sysfs attributes
> so pmu->cpus are initialized from the "cpus" without checking each
> CPUs is online or not. That's what proposed in Patch 1/3.
>
> There's a "cpus" sysfs for x86's hybrid machine, reading from the code [2].
> And it seems always reflect the online CPUs supported by that PMU.

Thanks Yicong, looking on a hybrid machine and taking cpu1 offline I
see the PMU's "cpus" not containing the offline CPU. I think this
supports that the PMU driver should be fixed for ARM, but we'll also
need a workaround in the perf tool for older kernels.

Thanks,
Ian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n779
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree//arch/x86/events/intel/core.c#n5736
>
> Thanks.