Re: [PATCH 2/3] perf: arm_pmu: Only show online CPUs in device's "cpus" attribute

From: Ian Rogers
Date: Mon Jun 03 2024 - 12:21:05 EST


On Mon, Jun 3, 2024 at 2:33 AM Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
>
> From: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>
> When there're CPUs offline after system booting, perf will failed:
> [root@localhost ~]# /home/yang/perf stat -a -e armv8_pmuv3_0/cycles/
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (cpu-clock).
> /bin/dmesg | grep -i perf may provide additional information.

Thanks for debugging this Yicong! The fact cycles is falling back on
cpu-clock I'm confused by, on ARM the PMU type generally isn't
PERF_TYPE_HARDWARE and so this fallback shouldn't happen:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n2900
I note your command line is for system wide event opening rather than
for a process. It is more strange the fallback is giving "No such
device".

> This is due to PMU's "cpus" is not updated and still contains offline
> CPUs and perf will try to open perf event on the offlined CPUs.

The perf tool will try to only open online CPUs. Unfortunately the
naming around this is confusing:

- any - an event may follow a task and schedule on "any" CPU. We
encode this with -1.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n24
- online - the set of online CPU.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n33
- all - I try to avoid this but it may still be in the code, "all"
usually is another name for online. Hopefully when we use this name it
is clearly defined:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/internal/evlist.h?h=perf-tools-next#n23

> Make "cpus" attribute only shows online CPUs and introduced a new
> "supported_cpus" where users can get the range of the CPUs this
> PMU supported monitoring.

I don't think this should be necessary as by default the CPUs the perf
tool will use will be the "online" CPUs:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evlist.c?h=perf-tools-next#n40

Could you create a reproduction of the problem you are encountering?
My expectation is for a core PMU to advertise the online and offline
CPUs it is valid for, and for perf to intersect:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/include/perf/cpumap.h?h=perf-tools-next#n44
those CPUs with the online CPUs so the perf_event_open doesn't fail.

Thanks,
Ian

> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
> ---
> drivers/perf/arm_pmu.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 8458fe2cebb4..acbb0e1d0414 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -558,13 +558,35 @@ static ssize_t cpus_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> {
> struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> - return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> + cpumask_var_t mask;
> + ssize_t n;
> +
> + /* If allocation failed then show the supported_cpus */
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +
> + cpumask_and(mask, &armpmu->supported_cpus, cpu_online_mask);
> + n = cpumap_print_to_pagebuf(true, buf, mask);
> + free_cpumask_var(mask);
> +
> + return n;
> }
>
> static DEVICE_ATTR_RO(cpus);
>
> +static ssize_t supported_cpus_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct arm_pmu *armpmu = to_arm_pmu(dev_get_drvdata(dev));
> +
> + return cpumap_print_to_pagebuf(true, buf, &armpmu->supported_cpus);
> +}
> +
> +static DEVICE_ATTR_RO(supported_cpus);
> +
> static struct attribute *armpmu_common_attrs[] = {
> &dev_attr_cpus.attr,
> + &dev_attr_supported_cpus.attr,
> NULL,
> };
>
> --
> 2.24.0
>