Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events

From: liwei (GF)
Date: Thu Sep 24 2020 - 10:14:30 EST

Next message: Phil Chang: "Re: Re: [PATCH] [PATCH] ARM64: Setup DMA32 zone size by bootargs"
Previous message: Li Heng: "[PATCH -next] usb: typec: Remove set but not used variable"
Next in thread: Namhyung Kim: "Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Andi,

On 2020/9/23 3:50, Andi Kleen wrote:
> On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
>>> After debugging, i found the root reason is that the xyarray fd is created
>>> by evsel__open_per_thread() ignoring the cpu passed in
>>> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
>>> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
>>> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
>>> perf_evsel__close_fd_cpu().
>>>
>>> To address this, add a flag to mark this situation and avoid using the
>>> affinity technique when closing/enabling/disabling events.
>>
>> The flag seems like a hack. How about figuring out the correct number of
>> CPUs and using that?
>
> Also would like to understand what's different on ARM64 than other architectures.
> Or could this happen on x86 too?
>

The problem is that when the user requests per-task events, the cpumask is expected
as NULL(dummy), while the armv8_pmu do has a cpumask which inherited by evsel.
The armv8_pmu's cpumask was added for heterogeneous systems. So this issue can not
happen on x86.

In fact, the cpumask is correct indeed, but it should't be used when we requesting
per-task events. As these events should be install on all cores, i doubt how much we
can benefit from the affinity technique, so i choosed to add a flag.

I also did a test on hisilicon arm64 d06 board, with 2 sockets 128 cores.
Testing the following command 3 times, with/without the affinity technique:

time tools/perf/perf stat -ddd -C 0-127 --per-core --timeout=5000 2> /dev/null

* (HEAD detached at 7074674e7338) perf cpumap: Maintain cpumaps ordered and without dups
real 0m8.039s
user 0m0.402s
sys 0m2.582s

real 0m7.939s
user 0m0.360s
sys 0m2.560s

real 0m7.997s
user 0m0.358s
sys 0m2.586s

* (HEAD detached at 704e2f5b700d) perf stat: Use affinity for enabling/disabling events
real 0m7.954s
user 0m0.308s
sys 0m2.590s

real 0m12.959s
user 0m0.332s
sys 0m2.582s

real 0m18.009s
user 0m0.346s
sys 0m2.562s

The offcpu time is much longer when using affinity, i think that's what migration costs,
could you please share me your test case?

Thanks,
Wei

Next message: Phil Chang: "Re: Re: [PATCH] [PATCH] ARM64: Setup DMA32 zone size by bootargs"
Previous message: Li Heng: "[PATCH -next] usb: typec: Remove set but not used variable"
Next in thread: Namhyung Kim: "Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]