Re: [RFC PATCH v3 3/8] mm: Hot page tracking and promotion

From: Alok Rathore

Date: Wed Nov 26 2025 - 08:50:04 EST


On 10/11/25 10:53AM, Bharata B Rao wrote:
This introduces a sub-system for collecting memory access
information from different sources. It maintains the hotness
information based on the access history and time of access.

Additionally, it provides per-lowertier-node kernel threads
(named kmigrated) that periodically promote the pages that
are eligible for promotion.

Sub-systems that generate hot page access info can report that
using this API:

int pghot_record_access(unsigned long pfn, int nid, int src,
unsigned long time)

@pfn: The PFN of the memory accessed
@nid: The accessing NUMA node ID
@src: The temperature source (sub-system) that generated the
access info
@time: The access time in jiffies

Some temperature sources may not provide the nid from which
the page was accessed. This is true for sources that use
page table scanning for PTE Accessed bit. For such sources,
the default toptier node to which such pages should be promoted
is hard coded.

Also, the access time provided some sources may at best be
considered approximate. This is especially true for hot pages
detected by PTE A bit scanning.

The hotness information is stored for every page of lower
tier memory in an unsigned long variable that is part of
mem_section data structure.

kmigrated is a per-lowertier-node kernel thread that migrates
the folios marked for migration in batches. Each kmigrated
thread walks the PFN range spanning its node and checks
for potential migration candidates.

Signed-off-by: Bharata B Rao <bharata@xxxxxxx>
---
include/linux/mmzone.h | 14 ++
include/linux/pghot.h | 52 ++++
include/linux/vm_event_item.h | 4 +
mm/Kconfig | 11 +
mm/Makefile | 1 +
mm/mm_init.c | 10 +
mm/page_ext.c | 11 +
mm/pghot.c | 446 ++++++++++++++++++++++++++++++++++
mm/vmstat.c | 4 +
9 files changed, 553 insertions(+)
create mode 100644 include/linux/pghot.h
create mode 100644 mm/pghot.c

+

<snip>

+/*
+ * Walks the PFNs of the zone, isolates and migrates them in batches.
+ */
+static void kmigrated_walk_zone(unsigned long start_pfn, unsigned long end_pfn,
+ int src_nid)
+{
+ int cur_nid = NUMA_NO_NODE;
+ LIST_HEAD(migrate_list);
+ int batch_count = 0;
+ struct folio *folio;
+ struct page *page;
+ unsigned long pfn;
+
+ pfn = start_pfn;
+ do {
+ unsigned long nid = NUMA_NO_NODE, freq = 0, time = 0, nr = 1;
+
+ if (!pfn_valid(pfn))
+ goto out_next;
+
+ page = pfn_to_online_page(pfn);
+ if (!page)
+ goto out_next;
+
+ folio = page_folio(page);
+ nr = folio_nr_pages(folio);
+ if (folio_nid(folio) != src_nid)
+ goto out_next;
+
+ if (!folio_test_lru(folio))
+ goto out_next;
+
+ if (pghot_get_hotness(pfn, &nid, &freq, &time))

Better to remove freq value, it’s not used later.

Regards,
Alok Rathore