Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu_nodestat

From: Johannes Weiner
Date: Mon Feb 03 2020 - 15:34:55 EST


On Mon, Feb 03, 2020 at 10:25:06AM -0800, Roman Gushchin wrote:
> On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote:
> > On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote:
> > > Currently s8 type is used for per-cpu caching of per-node statistics.
> > > It works fine because the overfill threshold can't exceed 125.
> > >
> > > But if some counters are in bytes (and the next commit in the series
> > > will convert slab counters to bytes), it's not gonna work:
> > > value in bytes can easily exceed s8 without exceeding the threshold
> > > converted to bytes. So to avoid overfilling per-cpu caches and breaking
> > > vmstats correctness, let's use s32 instead.
> > >
> > > This doesn't affect per-zone statistics. There are no plans to use
> > > zone-level byte-sized counters, so no reasons to change anything.
> >
> > Wait, is this still necessary? AFAIU, the node counters will account
> > full slab pages, including free space, and only the memcg counters
> > that track actual objects will be in bytes.
> >
> > Can you please elaborate?
>
> It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B)
> being in different units depending on the accounting scope.
> So I do convert all slab counters: global, per-lruvec,
> and per-memcg to bytes.

Since the node counters tracks allocated slab pages and the memcg
counter tracks allocated objects, arguably they shouldn't use the same
name anyway.

> Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec
> NR_SLAB_RECLAIMABLE_OBJ
> NR_SLAB_UNRECLAIMABLE_OBJ

Can we alias them and reuse their slots?

/* Reuse the node slab page counters item for charged objects */
MEMCG_SLAB_RECLAIMABLE = NR_SLAB_RECLAIMABLE,
MEMCG_SLAB_UNRECLAIMABLE = NR_SLAB_UNRECLAIMABLE,

> and keep global counters untouched. If going this way, I'd prefer to make
> them per-memcg, because it will simplify things on charging paths:
> now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(),
> and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to
> bump per-lruvec counters.

I don't quite follow. Don't you still have to update the global
counters?

> Btw, I wonder if we really need per-lruvec counters at all (at least
> being enabled by default). For the significant amount of users who
> have a single-node machine it doesn't bring anything except performance
> overhead.

Yeah, for single-node systems we should be able to redirect everything
to the memcg counters, without allocating and tracking lruvec copies.

> For those who have multiple nodes (and most likely many many
> memory cgroups) it provides way too many data except for debugging
> some weird mm issues.
> I guess in the absolute majority of cases having global per-node + per-memcg
> counters will be enough.

Hm? Reclaim uses the lruvec counters.