[PATCH] mm/vmstat: flush per-cpu node stats when a node goes offline
From: Gregory Price
Date: Sat Jun 27 2026 - 03:31:25 EST
A per-node vmstat counter is pgdat->vm_stat[] plus per-cpu deltas.
A balanced counter can sit split as global=+N / per-cpu=-N.
The folds reconciling the split only walk online nodes, so when
try_offline_node() marks a node offline - per-cpu deltas are stranded.
A subsequent online zeroes the per-cpu area but not pgdat->vm_stat[],
orphaning the +N permanently. All NR_VM_NODE_STAT_ITEMS are affected.
Flush the deltas before the node leaves the online set. A remote
fold races the periodic per-cpu fold, so do it as per-cpu work.
Discovered when a node/compact call hung for a nearly empty node, as
the math to determine throttling broke. Reproduced by repeated memory
hotplug/unplug cycles on a node under pressure. NR_ISOLATED_ANON
ratchets up and never returns to zero.
Fixes: 75ef71840539 ("mm, vmstat: add infrastructure for per-node vmstats")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Gregory Price <gourry@xxxxxxxxxx>
---
include/linux/vmstat.h | 2 ++
mm/memory_hotplug.c | 5 ++++-
mm/vmstat.c | 10 ++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 3c9c266cf782..ea1017427811 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -293,6 +293,7 @@ extern void __dec_node_state(struct pglist_data *, enum node_stat_item);
void quiet_vmstat(void);
void cpu_vm_stats_fold(int cpu);
+void sync_vm_stats(void);
void refresh_zone_stat_thresholds(void);
void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *);
@@ -397,6 +398,7 @@ static inline void __dec_node_page_state(struct page *page,
static inline void refresh_zone_stat_thresholds(void) { }
static inline void cpu_vm_stats_fold(int cpu) { }
+static inline void sync_vm_stats(void) { }
static inline void quiet_vmstat(void) { }
static inline void vmstat_flush_workqueue(void) { }
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7d60a7dd1e7b..10f676566f56 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2338,8 +2338,11 @@ void try_offline_node(int nid)
/*
* all memory/cpu of this node are removed, we can offline this
- * node now.
+ * node now. Fold any pending per-cpu vmstat diffs into the global
+ * counters first: once the node leaves the online set the periodic
+ * fold skips it, orphaning the residual on a later online.
*/
+ sync_vm_stats();
node_set_offline(nid);
unregister_node(nid);
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index f534972f517d..ad77343212d3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -941,6 +941,16 @@ void cpu_vm_stats_fold(int cpu)
fold_diff(global_zone_diff, global_node_diff);
}
+static void vmstat_fold_work(struct work_struct *w)
+{
+ refresh_cpu_vm_stats(false);
+}
+
+void sync_vm_stats(void)
+{
+ schedule_on_each_cpu(vmstat_fold_work);
+}
+
/*
* this is only called if !populated_zone(zone), which implies no other users of
* pset->vm_stat_diff[] exist.
--
2.53.0-Meta