Re: [PATCH v2] memory tiering: Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
From: Donet Tom
Date: Wed Apr 08 2026 - 09:23:34 EST
On 4/2/26 11:54 AM, Huang, Ying wrote:
Donet Tom <donettom@xxxxxxxxxxxxx> writes:
HiHi, Donet,
On 4/2/26 8:57 AM, Huang, Ying wrote:You can search NUMA_BALANCING_MEMORY_TIERING to find out what it does.
Donet Tom <donettom@xxxxxxxxxxxxx> writes:Thank you for the review comments.
In the current implementation, if NUMA_BALANCING_MEMORY_TIERING isNo. Even if NUMA_BALANCING_MEMORY_TIERING is disabled, we should still
disabled and the pages are on the lower tier, the pages may still be
promoted.
This happens because task_numa_work() updates the last_cpupid field to
record the last access time only when NUMA_BALANCING_MEMORY_TIERING is
enabled and the folio is on the lower tier. If
NUMA_BALANCING_MEMORY_TIERING is disabled, the last_cpupid field
can retains a valid last CPU id.
In should_numa_migrate_memory(), the decision checks whether
NUMA_BALANCING_MEMORY_TIERING is disabled, the folio is on the lower
tier, and last_cpupid is invalid. However, the last_cpupid can be
valid when NUMA_BALANCING_MEMORY_TIERING is disabled, the condition
evaluates to false and migration is allowed.
This patch prevents promotion when NUMA_BALANCING_MEMORY_TIERING is
disabled and the folio is on the lower tier.
Behavior before this change:
============================
- If NUMA_BALANCING_NORMAL is enabled, migration occurs between
nodes within the same memory tier, and promotion from lower
tier to higher tier may also happen.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from
lower tier to higher tier nodes is allowed.
Behavior after this change:
===========================
- If NUMA_BALANCING_NORMAL is enabled, migration will occur only
between nodes within the same memory tier.
- If NUMA_BALANCING_MEMORY_TIERING is enabled, promotion from lower
tier to higher tier nodes will be allowed.
- If both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL are
enabled, both migration (same tier) and promotion (cross tier) are
allowed.
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Donet Tom <donettom@xxxxxxxxxxxxx>
---
v1 -> v2
========
1. Dropped changes in task_numa_fault() since the original changes
already handle runtime disabling of NUMA_BALANCING_MEMORY_TIERING.
v1 -> https://lore.kernel.org/all/20260320092251.1290207-1-donettom@xxxxxxxxxxxxx/
---
kernel/sched/fair.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..4b43809a3fb1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2024,8 +2024,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
this_cpupid = cpu_pid_to_cpupid(dst_cpu, current->pid);
last_cpupid = folio_xchg_last_cpupid(folio, this_cpupid);
+ /*
+ * Do not allow promotion if NUMA_BALANCING_MEMORY_TIERING is disabled
+ * and the pages are on the lower tier.
+ */
if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
- !node_is_toptier(src_nid) && !cpupid_valid(last_cpupid))
+ !node_is_toptier(src_nid))
return false;
/*
allow migrate pages from lower tier to higher tier via
NUMA_BALANCING_NORMAL. If we have precious DDR, why waste it? This
follows the semantics of NUMA_BALANCING_NORMAL before introducing
NUMA_BALANCING_MEMORY_TIERING.
One thing I am trying to understand is that page promotion
appears to happen regardless of whether
NUMA_BALANCING_MEMORY_TIERING is enabled or disabled. In that
case, what is the specific role of
NUMA_BALANCING_MEMORY_TIERING? Do we get better performance
when it is enabled?
We can get better performance as the original commit message says.
When NUMA_BALANCING_MEMORY_TIERING is introduced, we didn't change the
original behavior of NUMA_BALANCING_MEMORY_NORMAL because we had no good
reason to do that. In fact, you change its behavior, so you should
provide some supporting data or bug report to justify the change.
My initial understanding was that disablingYou can change this, if you have some supporting data or bug report.
NUMA_BALANCING_MEMORY_TIERING could be used to turn off
promotion. However, it seems that currently we cannot control
promotion independently. If NUMA_BALANCING_NORMAL is disabled,
neither migration nor promotion happens, and if it is enabled,
both migration and promotion can occur.
I was under the impression that:
- NUMA_BALANCING_NORMAL would handle migration within the same tier,
- NUMA_BALANCING_MEMORY_TIERING would handle promotion across tiers,
- and enabling both would allow both migration and promotion.
This would provide more fine-grained control. Is my
understanding correct, or am I missing something here?
Thanks for the clarification. I was running some experiments where I only required migration, not promotion. However, I observed that promotion was still occurring even when NUMA_BALANCING_MEMORY_TIERING was disabled, which led me to believe it might be a bug, so I reported it.
As I understand it, enabling both NUMA_BALANCING_MEMORY_TIERING and NUMA_BALANCING_NORMAL results in both promotion and migration. Given this, do you see any concerns with modifying the behavior of NUMA_BALANCING_NORMAL?
With this patch, we would have better control over enabling and disabling promotion independently. I would appreciate your thoughts on this.
-Donet
---
Best Regards,
Huang, Ying