Re: [PATCH v2] mm/vmstat: fold stranded per-cpu node stats when a node comes online

From: Gregory Price

Date: Sat Jun 27 2026 - 20:31:34 EST

On Sat, Jun 27, 2026 at 04:10:07PM -0700, Andrew Morton wrote:
>
> > The existing code zeroes the per-cpu counters and causes a permanent
> > skew. Fold the stranded deltas instead, before the node rejoins the
> > online set. The node is not online yet and the hotplug lock is held,
> > so the remote access to per-cpu values is safe.
>
> Oh. Shouldn't we be doing this during offlining?
>

I tried this first. I was unable to convince myself there was a
safe way to accomplish this.

1) sashiko pointed out we can't schedule_on_each_cpu while holding
the hotplug lock because we'll re-take cpus_read_lock and cause
a deadlock condition with cpu-hotplug.

2) I'm not sure we can do it after the hotplug lock as been dropped,
at least not safely. At the very least another hot-plug re-adding
the node could start. That just seemed like a bad path.

3) foreign cpu access to the per-cpu values are not atomic with
respect to in-flight folds on the target cpu. this_cpu_xchg
and this_cpu_add are (i believe) only atomic wrt the cpu itself
(can't be interrupted mid-exchange).

doing it before node_offline() has problems (in-flight folds),
doing it after node_offline() still *technically* carries the same
in-flight fold risk - just narrower (fold has to have started already).

I couldn't convince myself there wasn't still a race, so here we are.

> > + for_each_possible_cpu(cpu) {
>
> That's a lot of CPUs
>

Unfortunately - cpus may have gone offline while the node was offline,
so we legitimately have to visit every *possible* cpu :[

> > + struct per_cpu_nodestat *p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
> >
> > - p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu);
> > + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
>
> and that's a lot of items.

I am aware :[. I suppose we could vectorize the collection here on some
archs, but I try to avoid being clever where I can.

>
> I guess the overall loop count won't be large enough to cause issues,
> but it's large!
>
> Perhaps there's some simple test we can do on the per_cpu_nodestat to
> avoid the inner loop? Perhaps might need to add a field for this?

Hadn't considered this, but maybe. Will take a look.

>
> btw, "for(int i..." is allowed nowadays. It'll make this code nicer, IMO.
>

aye aye o7

> And... Sashiko seems to have found a pre-existing issue:
> https://sashiko.dev/#/patchset/20260627202243.758289-1-gourry@xxxxxxxxxx
>

Will take a look, thanks!

~Gregory