Re: [PATCH v2] mm/vmstat: fold stranded per-cpu node stats when a node comes online
From: Andrew Morton
Date: Tue Jun 30 2026 - 18:55:39 EST
On Tue, 30 Jun 2026 16:57:43 -0400 Gregory Price <gourry@xxxxxxxxxx> wrote:
> On Sat, Jun 27, 2026 at 04:10:07PM -0700, Andrew Morton wrote:
> > On Sat, 27 Jun 2026 16:22:43 -0400 Gregory Price <gourry@xxxxxxxxxx> wrote:
> >
> > > + struct per_cpu_nodestat *p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
> > >
> > > - p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
> > > + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
> >
> > and that's a lot of items.
> >
> > I guess the overall loop count won't be large enough to cause issues,
> > but it's large!
> >
> > Perhaps there's some simple test we can do on the per_cpu_nodestat to
> > avoid the inner loop? Perhaps might need to add a field for this?
> >
>
> I took a look, but that would involve adding another per-cpu field and
> then making sure all the races on that field are respected as well.
>
> Not sure it's worth it for such an extremely rare event.
>
> I can try to get clever on the folding logic if you'd like, let me know.
>
> > btw, "for(int i..." is allowed nowadays. It'll make this code nicer, IMO.
> >
>
> Otherwise i can send you a respin for this.
Is OK, we could make this change in a million other places.
> > And... Sashiko seems to have found a pre-existing issue:
> > https://sashiko.dev/#/patchset/20260627202243.758289-1-gourry@xxxxxxxxxx
> >
>
> Incoming patch for this shortly. Pretty trivial.
Cool, what was the Subject?
I'll queue this patch in mm-hotfixes for some testing while we await
further review (please).
From: Gregory Price <gourry@xxxxxxxxxx>
Subject: mm/vmstat: fold stranded per-cpu node stats when a node comes online
Date: Sat, 27 Jun 2026 16:22:43 -0400
A per-node vmstat counter is pgdat->vm_stat[] plus per-cpu deltas. A
balanced counter can sit split as global=+N / per-cpu=-N.
The folds reconciling the split only walk online nodes, so when
try_offline_node() marks a node offline the per-cpu deltas are stranded.
A subsequent online resets the per-cpu area but not pgdat->vm_stat[],
orphaning the +N permanently. All NR_VM_NODE_STAT_ITEMS are affected.
The existing code zeroes the per-cpu counters and causes a permanent skew.
Fold the stranded deltas instead, before the node rejoins the online set.
The node is not online yet and the hotplug lock is held, so the remote
access to per-cpu values is safe.
Discovered when node compaction hung for a nearly empty node, as the math
to determine throttling broke. Reproduced by repeated memory
hotplug/unplug cycles on a node under pressure: NR_ISOLATED_ANON ratchets
up and never returns to zero.
Link: https://lore.kernel.org/20260627202243.758289-1-gourry@xxxxxxxxxx
Fixes: 75ef71840539 ("mm, vmstat: add infrastructure for per-node vmstats")
Signed-off-by: Gregory Price <gourry@xxxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Mike Rapoport <rppt@xxxxxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---
mm/mm_init.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/mm_init.c~mm-vmstat-fold-stranded-per-cpu-node-stats-when-a-node-comes-online
+++ a/mm/mm_init.c
@@ -1540,7 +1540,7 @@ void __ref free_area_init_core_hotplug(s
{
int nid = pgdat->node_id;
enum zone_type z;
- int cpu;
+ int cpu, i;
pgdat_init_internals(pgdat);
@@ -1558,10 +1558,17 @@ void __ref free_area_init_core_hotplug(s
pgdat->node_start_pfn = 0;
pgdat->node_present_pages = 0;
- for_each_online_cpu(cpu) {
- struct per_cpu_nodestat *p;
+ /*
+ * Hot-unplug can leave per-cpu vmstat deltas unfolded (folders skip
+ * offline nodes) - reconcile this at online. Foreign access to counters
+ * is safe: the node is not online yet and we hold the hotplug lock.
+ */
+ for_each_possible_cpu(cpu) {
+ struct per_cpu_nodestat *p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
- p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
+ for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
+ if (p->vm_node_stat_diff[i])
+ node_page_state_add(p->vm_node_stat_diff[i], pgdat, i);
memset(p, 0, sizeof(*p));
}
_