Re: [PATCH v5 0/4] Reduce NUMA related overhead in perf record profiling on large server systems

From: Alexey Budankov
Date: Thu Jan 31 2019 - 04:53:03 EST


On 28.01.2019 14:27, Jiri Olsa wrote:
> On Tue, Jan 22, 2019 at 08:45:12PM +0300, Alexey Budankov wrote:
>
> SNIP
>
>> The patch set has been validated on BT benchmark from NAS Parallel
>> Benchmarks [2] running on dual socket, 44 cores, 88 hw threads Broadwell
>> system with kernels v4.4-21-generic (Ubuntu 16.04) and v4.20.0-rc5
>> (tip perf/core).
>>
>> The patch set is for Arnaldo's perf/core repository.
>>
>> OVERHEAD:
>> BENCH REPORT BASED ELAPSED TIME BASED
>> v4.20.0-rc5
>> (tip perf/core):
>>
>> (current) SERIAL-SYS / BASE : 1.27x (14.37/11.31), 1.29x (15.19/11.69)
>> SERIAL-NODE / BASE : 1.15x (13.04/11.31), 1.17x (13.79/11.69)
>> SERIAL-CPU / BASE : 1.00x (11.32/11.31), 1.01x (11.89/11.69)
>>
>> AIO1-SYS / BASE : 1.29x (14.58/11.31), 1.29x (15.26/11.69)
>> AIO1-NODE / BASE : 1.08x (12.23/11.31), 1,11x (13.01/11.69)
>> AIO1-CPU / BASE : 1.07x (12.14/11.31), 1.08x (12.83/11.69)
>>
>> v4.4.0-21-generic
>> (Ubuntu 16.04 LTS):
>>
>> (current) SERIAL-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
>> SERIAL-NODE / BASE : 1.19x (13.02/10.87), 1.23x (14.03/11.32)
>> SERIAL-CPU / BASE : 1.03x (11.21/10.87), 1.07x (12.18/11.32)
>>
>> AIO1-SYS / BASE : 1.26x (13.73/10.87), 1.29x (14.69/11.32)
>> AIO1-NODE / BASE : 1.10x (12.04/10.87), 1.15x (13.03/11.32)
>> AIO1-CPU / BASE : 1.12x (12.20/10.87), 1.15x (13.09/11.32)
>>
>> ---
>> Alexey Budankov (4):
>> perf record: allocate affinity masks
>> perf record: bind the AIO user space buffers to nodes
>> perf record: apply affinity masks when reading mmap buffers
>> perf record: implement --affinity=node|cpu option
>>
>> tools/perf/Documentation/perf-record.txt | 5 ++
>> tools/perf/builtin-record.c | 45 +++++++++-
>> tools/perf/perf.h | 8 ++
>> tools/perf/util/cpumap.c | 10 +++
>> tools/perf/util/cpumap.h | 1 +
>> tools/perf/util/evlist.c | 6 +-
>> tools/perf/util/evlist.h | 2 +-
>> tools/perf/util/mmap.c | 105 ++++++++++++++++++++++-
>> tools/perf/util/mmap.h | 3 +-
>> 9 files changed, 175 insertions(+), 10 deletions(-)
>>
>> ---
>> Changes in v5:
>> - avoided multiple allocations of online cpu maps by
>> implementing it once in cpu_map__online()
>> - reduced indentation at record__parse_affinity()
>
> Reviewed-by: Jiri Olsa <jolsa@xxxxxxxxxx>

Thanks!
Alexey

>
> thanks,
> jirka
>