Re: [PATCH v1] perf arm: Workaround ARM PMUs cpu maps having offline cpus

From: Ian Rogers
Date: Wed Jun 12 2024 - 07:52:34 EST


On Wed, Jun 12, 2024 at 3:16 AM Yicong Yang <yangyicong@xxxxxxxxxx> wrote:
>
> Hi Ian,
>
> On 2024/6/7 14:53, Ian Rogers wrote:
> > When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
> > try to open events on those CPUs. ARM doesn't remove offline CPUs
> > meaning taking a CPU offline will cause perf commands to fail unless a
> > CPU map is passed on the command line.
> >
> > More context in:
> > https://lore.kernel.org/lkml/20240603092812.46616-1-yangyicong@xxxxxxxxxx/
> >
> > Reported-by: Yicong Yang <yangyicong@xxxxxxxxxx>
> > Closes: https://lore.kernel.org/lkml/20240603092812.46616-2-yangyicong@xxxxxxxxxx/
> > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> Tested worked for this case:
>
> [root@localhost tmp]# echo 0 > /sys/devices/system/cpu/cpu1/online
> [root@localhost tmp]# /home/yang/perf_static stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
> Error:
> The sys_perf_event_open() syscall returned with 19 (No such device) for event (armv8_pmuv3_0/cpu_cycles/).
> /bin/dmesg | grep -i perf may provide additional information.
>
> [root@localhost tmp]# /tmp/perf_Ian stat -e armv8_pmuv3_0/cpu_cycles/ --timeout 100
>
> Performance counter stats for 'system wide':
>
> 54994604 armv8_pmuv3_0/cpu_cycles/
>
> 0.176079800 seconds time elapsed
>
> So:
> Tested-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>
> But still wondering why isn't it better to move this into pmu_cpumask() as does in
> the previous patch? Yes currently this is a arm specific issue, but we cannot handle
> the case if later PMU doesn't update the cpus/cpumask either :)

Thanks Yicong. To the greatest extent possible I'm trying to avoid
unnecessary code during event parsing. On x86 there isn't a PMU that
has the buggy behavior of showing offline CPUs in the cpumask, so
computing the online CPUs and doing the intersection there is pure
overhead. The online CPUs calculation is either going to read a file
or probe via sysconf. The fallback sysconf probing may fail and can't
handle arbitrary holes in the CPU map (like in your example). So I'm
concerned about the overhead of doing this in pmu_cpumask and the
ability for it to break non-ARM platforms. I think the right
conservative thing to do is to just have the buggy cpumask fix for ARM
and work toward fixing the PMU drivers so potentially in the future we
can remove the fix there. Something I'd like to see is greater PMU
driver testing and detecting bugs, like:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/pmu.c?h=perf-tools-next#n285

Thanks,
Ian

> > ---
> > tools/perf/arch/arm/util/pmu.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
> > index 8b7cb68ba1a8..6b544edbd3f6 100644
> > --- a/tools/perf/arch/arm/util/pmu.c
> > +++ b/tools/perf/arch/arm/util/pmu.c
> > @@ -11,12 +11,15 @@
> >
> > #include "arm-spe.h"
> > #include "hisi-ptt.h"
> > +#include "../../../util/cpumap.h"
> > #include "../../../util/pmu.h"
> > #include "../../../util/cs-etm.h"
> > #include "../../arm64/util/mem-events.h"
> >
> > -void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> > +void perf_pmu__arch_init(struct perf_pmu *pmu)
> > {
> > + struct perf_cpu_map *intersect;
> > +
> > #ifdef HAVE_AUXTRACE_SUPPORT
> > if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
> > /* add ETM default config here */
> > @@ -33,6 +36,9 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> > pmu->selectable = true;
> > #endif
> > }
> > -
> > #endif
> > + /* Workaround some ARM PMU's failing to correctly set CPU maps for online processors. */
> > + intersect = perf_cpu_map__intersect(cpu_map__online(), pmu->cpus);
> > + perf_cpu_map__put(pmu->cpus);
> > + pmu->cpus = intersect;
> > }
> >