Re: [External] Re: [PATCH] mm: memcontrol: optimize per-lruvec stats counter memory usage

From: Muchun Song
Date: Mon Dec 07 2020 - 10:20:53 EST


On Mon, Dec 7, 2020 at 11:09 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Mon 07-12-20 20:56:58, Muchun Song wrote:
> > On Mon, Dec 7, 2020 at 8:36 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > >
> > > On Sun 06-12-20 16:56:39, Muchun Song wrote:
> > > > The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), so the type of s32
> > > > of lruvec_stat_cpu is enough. And introduce struct per_cpu_lruvec_stat
> > > > to optimize memory usage.
> > >
> > > How much savings are we talking about here? I am not deeply familiar
> > > with the pcp allocator but can it compact smaller data types much
> > > better?
> >
> > It is a percpu struct. The size of struct lruvec_stat is 304(tested on the
> > linux-5.5). So we can save 304 / 2 * nproc bytes per memcg where nproc
> > is the number of the possible CPU. If we have n memory cgroup in the
> > system. Finally, we can save (152 * nproc * n) bytes. In some configurations,
> > nproc here may be 512. And if we have a lot of dying cgroup. The n can be
> > 100, 000 (I once saw it on my server).
>
> This should be part of the changelog. In general, any optimization
> should come with some numbers showing the effect of the optimization.
>
> As I've said I am not really familiar with pcp internals and how
> efficiently it can organize smaller objects. Maybe it can really half
> the memory consumption.
>
> My only concern is that using smaller types for these counters can fire
> back later on because we have an inderect dependency between the batch
> size and the data type. In general I do not really object to the patch
> as long as savings are non trivial so that we are not creating a
> potential trap for something that is practically miniscule
> microptimization.

There is a similar structure named struct per_cpu_nodestat.

struct per_cpu_nodestat {
s8 stat_threshold;
s8 vm_node_stat_diff[NR_VM_NODE_STAT_ITEMS];
};

The s8 is enough for per-node vmstat counters. This also depends on
the batch size. It can be s8 for a long time. Why not s32 is not suitable
for the per-memcg vmstat counters? They are very similar, right?

Thanks.

> --
> Michal Hocko
> SUSE Labs



--
Yours,
Muchun