Re: [PATCH v2] cgroup: avoid css_set_lock in cgroup_css_set_fork()
From: Michal Koutný
Date: Tue Feb 10 2026 - 11:55:13 EST
On Tue, Feb 10, 2026 at 12:19:27PM +0100, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> This is going to depend on the scale you test on. I was testing on
> south of 32. But I also got a miniscule win from removing css set lock
> as the problem for me, instead everything shifted to tasklist.
To be on the same page -- that means you have nr_cpus >= 32?
> Per my other e-mail tasklist lock retains the terrible 3-times locking
> and it is doing rather expensive work while holding it. It is
> plausible it happens to be at the top at that scale, but that's only
> an argument for fixing it. Even if you don't see the css thing at the
> top at the moment, it will be there once someone(tm) sorts out the
> tasklist problem.
I did a quick test (with 6.18.8-1.g886f4c4-default), first `perf top`
while will-it-scale was running:
74.23% [kernel] [k] native_queued_spin_lock_slowpath
6.91% [kernel] [k] intel_idle_irq
0.87% [kernel] [k] update_sd_lb_stats.constprop.0
0.68% [kernel] [k] _raw_spin_lock
0.63% [kernel] [k] clear_page_erms
0.56% [kernel] [k] sched_balance_find_dst_group
0.40% [kernel] [k] alloc_vmap_area
and then bpftrace for the waiters:
$ bpftrace -e 'kprobe:native_queued_spin_lock_slowpath {@[arg0]=count();}
END {for($kv : @) {printf("%s\t%d\n", ksym($kv.0), (int64)$kv.1);} clear(@); }'\
>bpftrace.out
$ sort -k2 -r -n bpftrace.out | head | column -t
pidmap_lock 10482583
nft_pcpu_tun_ctx 3693517
css_set_lock 1511164
input_pool 976252
tasklist_lock 798578
nft_pcpu_tun_ctx 481962
0xffff8abc3ffd55b0 95371
0xffff8a6d3ffd65b0 93686
0xffff8a5e218f0840 29501
0xffff8a5e451dca40 29421
or measured by cummulative waiting time:
$ bpftrace -e 'kprobe:native_queued_spin_lock_slowpath {@[cpu]=arg0; @st[cpu]=nsecs;}
kretprobe:native_queued_spin_lock_slowpath /@[cpu]/ {$lat=nsecs-@st[cpu]; @lats[@[cpu]]=sum($lat);}
END {for($kv : @lats) {printf("%s\t%d\n", ksym($kv.0), (int64)$kv.1);} clear(@lats); clear(@st); clear(@) }'\
>bpftrace2.out
$ sort -k2 -r -n bpftrace2.out | head -n15 | column -t
pidmap_lock 1931209805
rcu_state 1823286316
rcu_state 1581455156
rcu_state 1328804835
rcu_state 1299517157
rcu_state 1134101627
nft_pcpu_tun_ctx 1027837665
0xffff8abc3ffd55b0 861441978
0xffff8a6d3ffd65b0 850732998
css_set_lock 520009479
input_pool 316598763
tasklist_lock 127161061
0xffff8aac40023200 32380418
0xffff8a5e002ab600 30194951
rcu_state 18334578
Hm, it's interesting that is suggestive of why I saw no big change with
css_set_lock in my setup.
Michal
Attachment:
signature.asc
Description: PGP signature