Re: [PATCH 0/4] perf lock contention: Improve lock symbol display (v1)

From: Arnaldo Carvalho de Melo
Date: Tue Mar 14 2023 - 08:31:25 EST


Em Mon, Mar 13, 2023 at 06:45:53PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 13, 2023 at 01:48:21PM -0700, Namhyung Kim escreveu:
> > Hello,
> >
> > This patchset improves the symbolization of locks for -l/--lock-addr mode.
> > As of now it only shows global lock symbols present in the kallsyms. But
> > we can add some more lock symbols by traversing pointers in the BPF program.
> >
> > For example, mmap_lock can be reached from the mm_struct of the current task
> > (task_struct->mm->mmap_lock) and we can compare the address of the give lock
> > with it. Similarly I've added 'siglock' for current->sighand->siglock.

Hey, we can go a bit further by using something like pahole's
--expand_types and --expand_pointers and play iterating a type members
and looking for locks, like:

⬢[acme@toolbox pahole]$ pahole task_struct | grep spinlock_t
spinlock_t alloc_lock; /* 3280 4 */
raw_spinlock_t pi_lock; /* 3284 4 */
seqcount_spinlock_t mems_allowed_seq; /* 3616 4 */
⬢[acme@toolbox pahole]$

Expand points will find mmap_lock:

⬢[acme@toolbox pahole]$ pahole --expand_pointers -C task_struct | grep -B10 mmap_lock
} *pgd;
atomic_t membarrier_state;
atomic_t mm_users;
atomic_t mm_count;

/* XXX 4 bytes hole, try to pack */

atomic_long_t pgtables_bytes;
int map_count;
spinlock_t page_table_lock;
struct rw_semaphore mmap_lock;
^C
⬢[acme@toolbox pahole]$


ITs just too much expansion to see task_struct->mm, but it is there, of
course:

⬢[acme@toolbox pahole]$ pahole mm_struct | grep mmap_lock
struct rw_semaphore mmap_lock; /* 120 40 */
⬢[acme@toolbox pahole]$

Also:

⬢[acme@toolbox pahole]$ pahole --contains rw_semaphore
address_space
signal_struct
key
inode
super_block
quota_info
user_namespace
blocking_notifier_head
backing_dev_info
anon_vma
tty_struct
cpufreq_policy
tcf_block
ipc_ids
autogroup
kvm_arch
posix_clock
listener_list
uprobe
kernfs_root
configfs_fragment
ext4_inode_info
ext4_group_info
btrfs_fs_info
extent_buffer
btrfs_dev_replace
btrfs_space_info
btrfs_inode
btrfs_block_group
tpm_chip
ib_device
ib_xrcd
blk_crypto_profile
controller
led_classdev
cppc_pcc_data
dm_snapshot
⬢[acme@toolbox pahole]$

And:

⬢[acme@toolbox pahole]$ pahole --find_pointers_to mm_struct
task_struct: mm
task_struct: active_mm
vm_area_struct: vm_mm
flush_tlb_info: mm
signal_struct: oom_mm
tlb_state: loaded_mm
linux_binprm: mm
mmu_gather: mm
trace_event_raw_xen_mmu_ptep_modify_prot: mm
trace_event_raw_xen_mmu_alloc_ptpage: mm
trace_event_raw_xen_mmu_pgd: mm
trace_event_raw_xen_mmu_flush_tlb_multi: mm
trace_event_raw_hyperv_mmu_flush_tlb_multi: mm
mmu_notifier: mm
mmu_notifier_range: mm
sgx_encl_mm: mm
rq: prev_mm
kvm: mm
cpuset_migrate_mm_work: mm
mmap_unlock_irq_work: mm
delayed_uprobe: mm
map_info: mm
trace_event_raw_mmap_lock: mm
trace_event_raw_mmap_lock_acquire_returned: mm
mm_walk: mm
make_exclusive_args: mm
mmu_interval_notifier: mm
mm_slot: mm
rmap_item: mm
trace_event_raw_mm_khugepaged_scan_pmd: mm
trace_event_raw_mm_collapse_huge_page: mm
trace_event_raw_mm_collapse_huge_page_swapin: mm
mm_slot: mm
move_charge_struct: mm
userfaultfd_ctx: mm
proc_maps_private: mm
remap_pfn: mm
intel_svm: mm
binder_alloc: vma_vm_mm
⬢[acme@toolbox pahole]$

- Arnaldo


> > On the other hand, we can traverse some of semi-global locks like per-cpu,
> > per-device, per-filesystem and so on. I've added 'rqlock' for each cpu's
> > runqueue lock.
> >
> > It cannot cover all types of locks in the system but it'd be fairly usefule
> > if we can add many of often contended locks. I tried to add futex locks
> > but it failed to find the __futex_data symbol from BTF. I'm not sure why but
> > I guess it's because the struct doesn't have a tag name.
> >
> > Those locks are added just because they got caught during my test.
> > It'd be nice if you suggest which locks to add and how to do that. :)
> > I'm thinking if there's a way to track file-based locks (like epoll, etc).
> >
> > Finally I also added a lock type name after the symbols (if any) so that we
> > can get some idea even though it has no symbol. The example output looks
> > like below:
> >
> > $ sudo ./perf lock con -abl -- sleep 1
> > contended total wait max wait avg wait address symbol
> >
> > 44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
> > 159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
> > 10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
> > 9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
> > 78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
> > 5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
> > 10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
> > 4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
> > 1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
> > 1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
> > 7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
> > 2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)
> >
> > The tasklist_lock is global so it's from the kallsyms. But others like
> > siglock, mmap_lock and rq_lock are from the BPF.
>
> Beautiful :-)
>
> And the csets are _so_ small and demonstrate techniques that should be
> used in more and more tools.
>
> Applied, testing.
>
> - Arnaldo
>
> > You get get the code at 'perf/lock-symbol-v1' branch in
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> > Thanks,
> > Namhyung
> >
> > Namhyung Kim (4):
> > perf lock contention: Track and show mmap_lock with address
> > perf lock contention: Track and show siglock with address
> > perf lock contention: Show per-cpu rq_lock with address
> > perf lock contention: Show lock type with address
> >
> > tools/perf/builtin-lock.c | 46 +++++++----
> > tools/perf/util/bpf_lock_contention.c | 35 ++++++++-
> > .../perf/util/bpf_skel/lock_contention.bpf.c | 77 +++++++++++++++++++
> > tools/perf/util/bpf_skel/lock_data.h | 14 ++++
> > 4 files changed, 152 insertions(+), 20 deletions(-)
> >
> >
> > base-commit: b8fa3e3833c14151a47ebebbc5427dcfe94bb407
> > --
> > 2.40.0.rc1.284.g88254d51c5-goog
> >
>
> --
>
> - Arnaldo

--

- Arnaldo