Re: [RFC PATCH v6 0/5] mm: Hot page tracking and promotion infrastructure

From: Bharata B Rao

Date: Mon Mar 23 2026 - 06:05:27 EST


Graph500 results

Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)

$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node 0 1 2
0: 10 32 50
1: 32 10 60
2: 255 255 10

Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
in the pghot case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
use of hint faults as source in the pghot case.
NUMAB3 - Enabled both regular and tiering mode of NUMA Balancing
(kernel.numa_balancing=3)

Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)

Graph500 details
----------------
Command: mpirun -n 128 --bind-to core --map-by core
graph500/src/graph500_reference_bfs 28 16

After the graph creation, the processes are stopped and data is migrated
to CXL node 2 before continuing so that BFS phase starts accessing lower
tier memory.

Total memory usage is slightly over 100GB and will fit within Node 0 and 1.
Hence there is no memory pressure to induce demotions.

harmonic_mean_TEPS - Higher is better
=====================================================================================
Base Base pghot-default
pghot-precise
NUMAB0 NUMAB2 NUMAB2 NUMAB2
=====================================================================================
harmonic_mean_TEPS 5.07693e+08 7.08679e+08 5.56854e+08 7.39417e+08
mean_time 8.45968 6.06046 7.71283 5.80853
median_TEPS 5.08914e+08 7.23181e+08 5.51614e+08 7.58993e+08
max_TEPS 5.15226e+08 1.01654e+09 7.75233e+08 9.69136e+08

pgpromote_success 0 13797978 13746431 13752523
numa_pte_updates 0 26727341 39998363 48374479
numa_hint_faults 0 13798301 24459996 32728927
=====================================================================================
pghot-default
NUMAB3
=====================================================================================
harmonic_mean_TEPS 7.18678e+08
mean_time 5.97614
median_TEPS 7.376e+08
max_TEPS 7.47337e+08

pgpromote_success 13821625
numa_pte_updates 93534398
numa_hint_faults 69164048
=====================================================================================
- The base case shows a good improvement with NUMAB2 in harmonic_mean_TEPS.
- The same improvement gets maintained with pghot-precise too.
- pghot-default mode doesn't show benefit even when achieving similar page promotion
numbers. This mode doesn't track accessing NID and by default promotes to NID=0
which probably isn't all that beneficial as processes are running on both Node 0
and Node 1.
- pghot-default recovers the performance when balancing between toptier nodes
0 and 1 is enabled in addition to hot page promotion.