Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion infrastructure
From: Bharata B Rao
Date: Mon Feb 23 2026 - 09:28:03 EST
On 29-Jan-26 8:10 PM, Bharata B Rao wrote:
>
> Results
> =======
> TODO: Will post benchmark nubmers as reply to this patchset soon.
>
Here are some numbers from NAS Parallel Benchmark (NPB) with BT application:
Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)
$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node 0 1 2
0: 10 32 50
1: 32 10 60
2: 255 255 10
Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
in the pghot case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
use of hint faults as source in the pghot case.
Both promotion and demotion are enabled in this case.
Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)
NAS-BT details
--------------
Command: mpirun -np 16 /usr/bin/numactl --cpunodebind=0,1
NPB3.4.4/NPB3.4-MPI/bin/bt.F.x
While class D uses around 24G of memory (which is too less to show the benefit
of promition), class E results in around 368G of memory which overflows my
toptier. Hence I wanted something in between these classes. So I have modified
class F to the problem size of 768 which results in around 160GB of memory.
After the memory consumption stabilizes, all the rank PIDs are paused and
their memory is moved to CXL node using migratepages command. This simulates
the situation of memory residing on lower tier node and access by BT processes
leading to promotion.
Time in seconds - Lower is better
Mop/s total - Higher is better
=====================================================================================
Base Base pghot-default
pghot-precise
NUMAB0 NUMAB2 NUMAB2 NUMAB2
=====================================================================================
Time in seconds 7349.86 4422.50 6219.71 4113.56
Mop/s total 53247.66 88493.630 62923.030 95139.810
pgpromote_success 0 42181834 248503390 41955718
pgpromote_candidate 0 0 577086192 0
pgpromote_candidate_nrl 0 42181834 29410329 41956171
pgdemote_kswapd 0 0 216489010 0
numa_pte_updates 0 42252749 607470975 42037882
numa_hint_faults 0 42183772 606540729 41968150
=====================================================================================
- In the base case, the benchmark numbers improve significantly due to hot page
promotion.
- Though the benchmark runs for hundreds of minutes, the pages get promoted
within the first few mins.
- pghot-precise is able to match the base case numbers.
- The benchmark suffers in pghot-default case due to promotion being limited
to the default NID (0) only. This leads to excessive PTE updates, hint faults,
demotion and promotion churn.