Re: [RFC PATCH v4 3/9] mm: Hot page tracking and promotion
From: Alok Rathore
Date: Mon Dec 22 2025 - 05:28:23 EST
On 06/12/25 03:44PM, Bharata B Rao wrote:
This introduces a sub-system for collecting memory access
information from different sources. It maintains the hotness
information based on the access history and time of access.
Additionally, it provides per-lowertier-node kernel threads
(named kmigrated) that periodically promote the pages that
are eligible for promotion.
Sub-systems that generate hot page access info can report that
using this API:
int pghot_record_access(unsigned long pfn, int nid, int src,
unsigned long time)
@pfn: The PFN of the memory accessed
@nid: The accessing NUMA node ID
@src: The temperature source (sub-system) that generated the
access info
@time: The access time in jiffies
Some temperature sources may not provide the nid from which
the page was accessed. This is true for sources that use
page table scanning for PTE Accessed bit. For such sources,
the default toptier node to which such pages should be promoted
is hard coded.
The hotness information is stored for every page of lower
tier memory in an unsigned long variable that is part of
mem_section data structure.
kmigrated is a per-lowertier-node kernel thread that migrates
the folios marked for migration in batches. Each kmigrated
thread walks the PFN range spanning its node and checks
for potential migration candidates.
A bunch of tunables for enabling different hotness sources,
setting target_nid, frequency threshold are provided in debugfs.
Signed-off-by: Bharata B Rao <bharata@xxxxxxx>
<snip>
+++ b/include/linux/pghot.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PGHOT_H
+#define _LINUX_PGHOT_H
+
+/* Page hotness temperature sources */
+enum pghot_src {
+ PGHOT_HW_HINTS,
+ PGHOT_PGTABLE_SCAN,
+ PGHOT_HINT_FAULT,
+};
+
+#ifdef CONFIG_PGHOT
+/*
+ * Bit positions to enable individual sources in pghot/records_enabled
+ * of debugfs.
+ */
+enum pghot_src_enabed {
+ PGHOT_HWHINTS_BIT = 0,
+ PGHOT_PGTSCAN_BIT,
+ PGHOT_HINTFAULT_BIT,
+ PGHOT_MAX_BIT
+};
+
+#define PGHOT_HWHINTS_ENABLED BIT(PGHOT_HWHINTS_BIT)
+#define PGHOT_PGTSCAN_ENABLED BIT(PGHOT_PGTSCAN_BIT)
+#define PGHOT_HINTFAULT_ENABLED BIT(PGHOT_HINTFAULT_BIT)
+#define PGHOT_SRC_ENABLED_MASK GENMASK(PGHOT_MAX_BIT - 1, 0)
+
+#define PGHOT_DEFAULT_FREQ_WINDOW (5 * MSEC_PER_SEC)
+#define PGHOT_DEFAULT_FREQ_THRESHOLD 2
+
+#define KMIGRATED_DEFAULT_SLEEP_MS 100
+#define KMIGRATED_DEFAULT_BATCH_NR 512
+
+#define PGHOT_DEFAULT_NODE 0
+
+/*
+ * Bits 0-31 are used to store nid, frequency and time.
+ * Bits 32-62 are unused now.
+ * Bit 63 is used to indicate the page is ready for migration.
+ */
+#define PGHOT_MIGRATE_READY 63
+
+#define PGHOT_NID_WIDTH 10
+#define PGHOT_FREQ_WIDTH 3
+/* time is stored in 19 bits which can represent up to 8.73s with HZ=1000 */
If we consider HZ = 1000 then using 19 bit time is coming 8.73 mins. I think by mistake you commented as 8.73 secs.
Suggetion:
If we are targeting to promote page in ~8 secs then 13 bits would be enough, that way we can handle hotness using 32 bits per pfn insead of 64 bits.
#define PGHOT_MIGRATE_READY 31
#define PGHOT_NID_WIDTH 10
#define PGHOT_FREQ_WIDTH 3
/* time is stored in 13 bits which can represent up to 8.19s with HZ=1000 */
#define PGHOT_TIME_WIDTH 13
+#define PGHOT_TIME_WIDTH 19
+
+#define PGHOT_NID_SHIFT 0
+#define PGHOT_FREQ_SHIFT (PGHOT_NID_SHIFT + PGHOT_NID_WIDTH)
+#define PGHOT_TIME_SHIFT (PGHOT_FREQ_SHIFT + PGHOT_FREQ_WIDTH)
+
+#define PGHOT_NID_MASK ((1UL << PGHOT_NID_SHIFT) - 1)
+#define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_SHIFT) - 1)
+#define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_SHIFT) - 1)
Mask generation of freq, nid and time seems not correct. It should be
#define PGHOT_NID_MASK ((1UL << PGHOT_NID_WIDTH) - 1)
#define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_WIDTH) - 1)
#define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_WIDTH) - 1)
Can you please have a look?
Regards,
Alok Rathore