Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat

From: Roman Gushchin
Date: Mon Feb 03 2020 - 13:25:30 EST


On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote:
> On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote:
> > Currently s8 type is used for per-cpu caching of per-node statistics.
> > It works fine because the overfill threshold can't exceed 125.
> >
> > But if some counters are in bytes (and the next commit in the series
> > will convert slab counters to bytes), it's not gonna work:
> > value in bytes can easily exceed s8 without exceeding the threshold
> > converted to bytes. So to avoid overfilling per-cpu caches and breaking
> > vmstats correctness, let's use s32 instead.
> >
> > This doesn't affect per-zone statistics. There are no plans to use
> > zone-level byte-sized counters, so no reasons to change anything.
>
> Wait, is this still necessary? AFAIU, the node counters will account
> full slab pages, including free space, and only the memcg counters
> that track actual objects will be in bytes.
>
> Can you please elaborate?

It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B)
being in different units depending on the accounting scope.
So I do convert all slab counters: global, per-lruvec,
and per-memcg to bytes.

Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec
NR_SLAB_RECLAIMABLE_OBJ
NR_SLAB_UNRECLAIMABLE_OBJ
and keep global counters untouched. If going this way, I'd prefer to make
them per-memcg, because it will simplify things on charging paths:
now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(),
and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to
bump per-lruvec counters.


Btw, I wonder if we really need per-lruvec counters at all (at least
being enabled by default). For the significant amount of users who
have a single-node machine it doesn't bring anything except performance
overhead. For those who have multiple nodes (and most likely many many
memory cgroups) it provides way too many data except for debugging
some weird mm issues.
I guess in the absolute majority of cases having global per-node + per-memcg
counters will be enough.

Thanks!