On 10/2/24 9:53 AM, Mathieu Desnoyers wrote:
On 2024-10-02 17:36, Mathieu Desnoyers wrote:
On 2024-10-02 17:33, Matthew Wilcox wrote:
On Wed, Oct 02, 2024 at 11:26:27AM -0400, Mathieu Desnoyers wrote:
On 2024-10-02 16:09, Paul E. McKenney wrote:
On Tue, Oct 01, 2024 at 09:02:01PM -0400, Mathieu Desnoyers wrote:
Hazard pointers appear to be a good fit for replacing refcount based lazy
active mm tracking.
Highlight:
will-it-scale context_switch1_threads
nr threads (-t) speedup
24 +3%
48 +12%
96 +21%
192 +28%
Impressive!!!
I have to ask... Any data for smaller numbers of CPUs?
Sure, but they are far less exciting ;-)
How many CPUs in the system under test?
2 sockets, 96-core per socket:
CPU(s): 384
On-line CPU(s) list: 0-383
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 68%
CPU max MHz: 3709.0000
CPU min MHz: 400.0000
BogoMIPS: 4800.00
Note that Jens Axboe got even more impressive speedups testing this
on his 512-hw-thread EPYC [1] (390% speedup for 192 threads). I've
noticed I had schedstats and sched debug enabled in my config, so I'll have to re-run my tests.
A quick re-run of the 128-thread case with schedstats and sched debug
disabled still show around 26% speedup, similar to my prior numbers.
I'm not sure why Jens has much better speedups on a similar system.
I'm attaching my config in case someone spots anything obvious. Note
that my BIOS is configured to show 24 NUMA nodes to the kernel (one
NUMA node per core complex).
Here's my .config - note it's from the stock kernel run, which is why it
still has:
CONFIG_MMU_LAZY_TLB_REFCOUNT=y
set. Have the same numa configuration as you, just end up with 32 nodes
on this box.