RE: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events

From: Song Bao Hua (Barry Song)
Date: Tue Oct 06 2020 - 02:51:18 EST




> -----Original Message-----
> From: Jiri Olsa [mailto:jolsa@xxxxxxxxxx]
> Sent: Friday, October 2, 2020 10:00 PM
> To: Namhyung Kim <namhyung@xxxxxxxxxx>; liwei (GF)
> <liwei391@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>; Andi Kleen <ak@xxxxxxxxxxxxxxx>;
> Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>; Alexey Budankov
> <alexey.budankov@xxxxxxxxxxxxxxx>; Adrian Hunter
> <adrian.hunter@xxxxxxxxx>; Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>;
> linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>; Peter Zijlstra
> <peterz@xxxxxxxxxxxxx>; Andi Kleen <andi@xxxxxxxxxxxxxx>; Libin (Huawei)
> <huawei.libin@xxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu
> events
>
> On Thu, Sep 24, 2020 at 11:36:23PM +0900, Namhyung Kim wrote:
> > On Wed, Sep 23, 2020 at 10:19:00PM +0200, Jiri Olsa wrote:
> > > On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> > > > I think the problem is that armv8_pmu has a cpumask,
> > > > and the user requested per-task events.
> > > >
> > > > The code tried to open the event with a dummy cpu map
> > > > since it's not a cpu event, but the pmu has cpu map and
> > > > it's passed to evsel. So there's confusion somewhere
> > > > whether it should use evsel->cpus or a dummy map.
> > >
> > > you're right, I have following cpus file in pmu:
> > >
> > > # cat /sys/devices/armv8_pmuv3_0/cpus
> > > 0-3
> > >
> > > covering all the cpus.. and once you have cpumask/cpus file,
> > > you're system wide by default in current code, but we should
> > > not crash ;-)
> > >
> > > I tried to cover this case in patch below and I probably broke
> > > some other use cases, but perhaps we could allow to open counters
> > > per cpus for given workload
> > >
> > > I'll try to look at this more tomorrow
> >
> > I'm thinking about a different approach, we can ignore cpu map
> > for the ARM cpu PMU and use the dummy, not tested ;-)
> >
> > Thanks
> > Namhyung
> >
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 2208444ecb44..cfcdbd7be066 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct
> perf_evlist *evlist,
> > if (!evsel->own_cpus || evlist->has_user_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > + } else if (!evsel->system_wide &&
> perf_cpu_map__empty(evlist->cpus)) {
> > + perf_cpu_map__put(evsel->cpus);
> > + evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > } else if (evsel->cpus != evsel->own_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> >
>
> Wei Li,
> is this fixing your problem?

As I have seen the same crash and suggested another patch:
[PATCH] perf evlist: fix memory corruption for Kernel PMU event
https://lore.kernel.org/lkml/20201001115729.27116-1-song.bao.hua@xxxxxxxxxxxxx/

Also, I have tested Namhyung Kim's patch on ARM64. It does fix the crash for me. So:
Tested-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx>

Please put the below fixes-tag in commit log:
Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")

Thanks
Barry