On Thu, Mar 30, 2023 at 07:09:11PM -0400, Mathieu Desnoyers wrote:
Keep track of the currently allocated mm_cid for each mm/cpu rather than
freeing them immediately. This eliminates most atomic ops when context
switching back and forth between threads belonging to different memory
spaces in multi-threaded scenarios (many processes, each with many
threads).
Good news, the lock contention is now gone and back to v6.2 level:
node0_0.profile: 0.07% 0.07% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_1.profile: 0.06% 0.06% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_2.profile: 0.09% 0.09% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_3.profile: 0.08% 0.08% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_4.profile: 0.09% 0.09% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_5.profile: 0.10% 0.10% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_6.profile: 0.10% 0.10% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_7.profile: 0.07% 0.07% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_8.profile: 0.08% 0.08% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node0_9.profile: 0.06% 0.06% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_0.profile: 0.41% 0.41% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_1.profile: 0.38% 0.38% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_2.profile: 0.44% 0.44% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_3.profile: 5.64% 5.64% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_4.profile: 6.08% 6.08% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_5.profile: 3.45% 3.45% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_6.profile: 2.09% 2.09% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_7.profile: 2.72% 2.72% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_8.profile: 0.16% 0.16% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
node1_9.profile: 0.15% 0.15% [kernel.vmlinux] [k] native_queued_spin_lock_slowpath
(those few profiles from node1's cpus that have more than 2% contention
are from thermal functions)
Tested-by: Aaron Lu <aaron.lu@xxxxxxxxx> # lock contention part