[PATCH v1 0/3] perf record: adapt NUMA awareness to machines with #CPUs > 1K
From: Alexey Budankov
Date: Wed Nov 20 2019 - 04:33:16 EST
Current implementation of cpu_set_t type by glibc has internal cpu
mask size limitation of no more than 1024 CPUs. This limitation confines
NUMA awareness of Perf tool in record mode, thru --affinity option,
to the first 1024 CPUs on machines with larger amount of CPUs.
This patch set enables Perf tool to overcome 1024 CPUs limitation by
using a dedicated struct mmap_cpu_mask type and applying tool's bitmap
API operations to manipulate affinity masks of the tool's thread and
the mmaped data buffers.
tools bitmap API has been extended with bitmap_equal() operation
and its implementation is derived from the kernel one.
---
Alexey Budankov (3):
tools bitmap: extend bitmap API with bitmap_equal()
perf mmap: declare type for cpu mask of arbitrary length
perf record: adapt affinity to machines with #CPUs > 1K
tools/include/linux/bitmap.h | 21 +++++++++++++++++++++
tools/lib/bitmap.c | 15 +++++++++++++++
tools/perf/builtin-record.c | 28 ++++++++++++++++++++++------
tools/perf/util/mmap.c | 28 ++++++++++++++++++++++------
tools/perf/util/mmap.h | 11 ++++++++++-
5 files changed, 90 insertions(+), 13 deletions(-)
---
Testing:
$ tools/perf/perf record -v --affinity=cpu -- ls
thread mask[8]: empty
Using CPUID GenuineIntel-6-5E-3
...
mmap size 528384B
0x7f95f8f85010: mmap mask[8]: 0
0x7f95f8f950d8: mmap mask[8]: 1
0x7f95f8fa51a0: mmap mask[8]: 2
0x7f95f8fb5268: mmap mask[8]: 3
0x7f95f8fc5330: mmap mask[8]: 4
0x7f95f8fd53f8: mmap mask[8]: 5
0x7f95f8fe54c0: mmap mask[8]: 6
0x7f95f8ff5588: mmap mask[8]: 7
...
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
arch copy Documentation init kernel MAINTAINERS modules.builtin.modinfo perf.data scripts System.map vmlinux
block COPYING drivers ipc lbuild Makefile modules.order perf.data.old security tools vmlinux.o
certs CREDITS fs Kbuild lib mm Module.symvers README sound usr
config-5.2.7-100.fc29.x86_64 crypto include Kconfig LICENSES modules.builtin net samples stdio virt
thread mask[8]: 4
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
thread mask[8]: 4
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
[ perf record: Woken up 0 times to write data ]
thread mask[8]: 0
thread mask[8]: 1
thread mask[8]: 2
thread mask[8]: 3
thread mask[8]: 4
thread mask[8]: 5
thread mask[8]: 6
thread mask[8]: 7
...
[ perf record: Captured and wrote 0.014 MB perf.data (11 samples) ]
--
2.20.1