Re: [PATCH v2] memcg: use ratelimited stats flush in the reclaim
From: Shakeel Butt
Date: Wed Aug 14 2024 - 20:30:01 EST
On Wed, Aug 14, 2024 at 04:48:42PM GMT, Yosry Ahmed wrote:
> On Wed, Aug 14, 2024 at 4:42 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> >
> > On Wed, Aug 14, 2024 at 04:03:13PM GMT, Nhat Pham wrote:
> > > On Wed, Aug 14, 2024 at 9:32 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
> > > >
> > > >
> > > > Ccing Nhat
> > > >
> > > > On Wed, Aug 14, 2024 at 02:57:38PM GMT, Jesper Dangaard Brouer wrote:
> > > > > I suspect the next whac-a-mole will be the rstat flush for the slab code
> > > > > that kswapd also activates via shrink_slab, that via
> > > > > shrinker->count_objects() invoke count_shadow_nodes().
> > > > >
> > > >
> > > > Actually count_shadow_nodes() is already using ratelimited version.
> > > > However zswap_shrinker_count() is still using the sync version. Nhat is
> > > > modifying this code at the moment and we can ask if we really need most
> > > > accurate values for MEMCG_ZSWAP_B and MEMCG_ZSWAPPED for the zswap
> > > > writeback heuristic.
> > >
> > > You are referring to this, correct:
> > >
> > > mem_cgroup_flush_stats(memcg);
> > > nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
> > > nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
> > >
> > > It's already a bit less-than-accurate - as you pointed out in another
> > > discussion, it takes into account the objects and sizes of the entire
> > > subtree, rather than just the ones charged to the current (memcg,
> > > node) combo. Feel free to optimize this away!
> > >
> > > In fact, I should probably replace this with another (atomic?) counter
> > > in zswap_lruvec_state struct, which tracks the post-compression size.
> > > That way, we'll have a better estimate of the compression factor -
> > > total post-compression size / (length of LRU * page size), and
> > > perhaps avoid the whole stat flushing path altogether...
> > >
> >
> > That sounds like much better solution than relying on rstat for accurate
> > stats.
>
> We can also use such atomic counters in obj_cgroup_may_zswap() and
> eliminate the rstat flush there as well. Same for zswap_current_read()
> probably.
>
> Most in-kernel flushers really only need a few stats, so I am
> wondering if it's better to incrementally move these ones outside of
> the rstat framework and completely eliminate in-kernel flushers. For
> instance, MGLRU does not require the flush that reclaim does as
> Shakeel pointed out.
>
> This will solve so many scalability problems that all of us have
> observed at some point or another and tried to optimize. I believe
> using rstat for userspace reads was the original intention anyway.
I like this direction and I think zswap would be a good first target.