Re: [PATCH v2 2/7] memcg: dynamically allocate lruvec_stats

From: Shakeel Butt
Date: Mon Apr 29 2024 - 15:46:47 EST


On Mon, Apr 29, 2024 at 08:50:11AM -0700, Roman Gushchin wrote:
> On Fri, Apr 26, 2024 at 05:37:28PM -0700, Shakeel Butt wrote:
[...]
> > +unsigned long lruvec_page_state_local(struct lruvec *lruvec,
> > + enum node_stat_item idx)
> > +{
> > + struct mem_cgroup_per_node *pn;
> > + long x = 0;
> > +
> > + if (mem_cgroup_disabled())
> > + return node_page_state(lruvec_pgdat(lruvec), idx);
> > +
> > + pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> > + x = READ_ONCE(pn->lruvec_stats->state_local[idx]);
> > +#ifdef CONFIG_SMP
> > + if (x < 0)
> > + x = 0;
> > +#endif
>
> Not directly related to your change, but do we still need it? And if yes,
> do we really care about !CONFIG_SMP case enough to justify these #ifdefs?
>

That's a good question and I think this is still needed. Particularly on
large machines with large number of CPUs, we can have a situation where
the flusher is flushing the CPU 100 and in parallel some workload
allocated a lot of pages on, let's say, CPU 0 and freed on CPU 200.

> > + return x;
> > +}
> > +
> > /* Subset of vm_event_item to report for memcg event stats */
> > static const unsigned int memcg_vm_event_stat[] = {
> > PGPGIN,
> > @@ -5492,18 +5546,25 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
> > if (!pn)
> > return 1;
> >
> > + pn->lruvec_stats = kzalloc_node(sizeof(struct lruvec_stats), GFP_KERNEL,
> > + node);
>
> Why not GFP_KERNEL_ACCOUNT?
>

Previously struct lruvec_stats was part of struct mem_cgroup_per_node
and we use GFP_KERNEL to allocate struct mem_cgroup_per_node. I kept the
behavior same and if we want to switch to GFP_KERNEL_ACCOUNT, I think it
should be a separate patch.

Thanks for the review.
Shakeel