[PATCHSET 0/7] perf lock contention: Improve performance if map is full (v1)

From: Namhyung Kim
Date: Thu Apr 06 2023 - 17:06:19 EST


Hello,

I got a report that the overhead of perf lock contention is too big in
some cases. It was running the task aggregation mode (-t) at the moment
and there were lots of tasks contending each other.

It turned out that the hash map update is a problem. The result is saved
in the lock_stat hash map which is pre-allocated. The BPF program never
deletes data in the map, but just adds. But if the map is full, (try to)
update the map becomes a very heavy operation - since it needs to check
every CPU's freelist to get a new node to save the result. But we know
it'd fail when the map is full. No need to update then.

I've checked it on my 64 CPU machine with this.

$ perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run

Total time: 2.825 [sec]

And I used the task mode, so that it can guarantee the map is full.
The default map entry size is 16K and this workload has 40K tasks.

Before:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run

Total time: 11.299 [sec]
contended total wait max wait avg wait pid comm

19284 3.51 s 3.70 ms 181.91 us 1305863 sched-messaging
243 84.09 ms 466.67 us 346.04 us 1336608 sched-messaging
177 66.35 ms 12.08 ms 374.88 us 1220416 node

After:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run

Total time: 3.044 [sec]
contended total wait max wait avg wait pid comm

18743 591.92 ms 442.96 us 31.58 us 1431454 sched-messaging
51 210.64 ms 207.45 ms 4.13 ms 1468724 sched-messaging
81 68.61 ms 65.79 ms 847.07 us 1463183 sched-messaging

=== output for debug ===

bad: 1164137, total: 2253341
bad rate: 51.66 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 1164137

The first few patches are small cleanups and fixes. You can get the code
from 'perf/lock-map-v1' branch in

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

Namhyung Kim (7):
perf lock contention: Simplify parse_lock_type()
perf lock contention: Use -M for --map-nr-entries
perf lock contention: Update default map size to 16384
perf lock contention: Add data failure stat
perf lock contention: Update total/bad stats for hidden entries
perf lock contention: Revise needs_callstack() condition
perf lock contention: Do not try to update if hash map is full

tools/perf/Documentation/perf-lock.txt | 4 +-
tools/perf/builtin-lock.c | 64 ++++++++-----------
tools/perf/util/bpf_lock_contention.c | 7 +-
.../perf/util/bpf_skel/lock_contention.bpf.c | 29 +++++++--
tools/perf/util/bpf_skel/lock_data.h | 3 +
tools/perf/util/lock-contention.h | 2 +
6 files changed, 60 insertions(+), 49 deletions(-)


base-commit: e5116f46d44b72ede59a6923829f68a8b8f84e76
--
2.40.0.577.gac1e443424-goog