Re: [PATCH] zswap: do not shrink when memory.zswap.max is 0

From: Nhat Pham
Date: Tue May 30 2023 - 14:27:46 EST


On Tue, May 30, 2023 at 9:53 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Tue, May 30, 2023 at 9:22 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > Before storing a page, zswap first checks if the number of stored pages
> > exceeds the limit specified by memory.zswap.max, for each cgroup in the
> > hierarchy. If this limit is reached or exceeded, then zswap shrinking is
> > triggered and short-circuits the store attempt.
> >
> > However, if memory.zswap.max = 0 for a cgroup, no amount of writeback
> > will allow future store attempts from processes in this cgroup to
> > succeed. Furthermore, this create a pathological behavior in a system
> > where some cgroups have memory.zswap.max = 0 and some do not: the
> > processes in the former cgroups, under memory pressure, will evict pages
> > stored by the latter continually, until the need for swap ceases or the
> > pool becomes empty.
> >
> > As a result of this, we observe a disproportionate amount of zswap
> > writeback and a perpetually small zswap pool in our experiments, even
> > though the pool limit is never hit.
> >
> > This patch fixes the issue by rejecting zswap store attempt without
> > shrinking the pool when memory.zswap.max is 0.
> >
> > Fixes: f4840ccfca25 ("zswap: memcg accounting")
> > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>
> > ---
> > include/linux/memcontrol.h | 6 +++---
> > mm/memcontrol.c | 8 ++++----
> > mm/zswap.c | 9 +++++++--
> > 3 files changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 222d7370134c..507bed3a28b0 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -1899,13 +1899,13 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,
> > #endif /* CONFIG_MEMCG_KMEM */
> >
> > #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
> > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg);
> > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg);
> > void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size);
> > void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size);
> > #else
> > -static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > +static inline int obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > {
> > - return true;
> > + return 0;
> > }
> > static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg,
> > size_t size)
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4b27e245a055..09aad0e6f2ea 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -7783,10 +7783,10 @@ static struct cftype memsw_files[] = {
> > * spending cycles on compression when there is already no room left
> > * or zswap is disabled altogether somewhere in the hierarchy.
> > */
> > -bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > +int obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > {
> > struct mem_cgroup *memcg, *original_memcg;
> > - bool ret = true;
> > + int ret = 0;
> >
> > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > return true;
> > @@ -7800,7 +7800,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > if (max == PAGE_COUNTER_MAX)
> > continue;
> > if (max == 0) {
> > - ret = false;
> > + ret = -ENODEV;
> > break;
> > }
> >
> > @@ -7808,7 +7808,7 @@ bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> > pages = memcg_page_state(memcg, MEMCG_ZSWAP_B) / PAGE_SIZE;
> > if (pages < max)
> > continue;
> > - ret = false;
> > + ret = -ENOMEM;
> > break;
> > }
> > mem_cgroup_put(original_memcg);
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 59da2a415fbb..7b13dc865438 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -1175,8 +1175,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> > }
> >
> > objcg = get_obj_cgroup_from_page(page);
> > - if (objcg && !obj_cgroup_may_zswap(objcg))
> > - goto shrink;
> > + if (objcg) {
> > + ret = obj_cgroup_may_zswap(objcg);
> > + if (ret == -ENODEV)
> > + goto reject;
> > + if (ret == -ENOMEM)
> > + goto shrink;
> > + }
>
> I wonder if we should just make this:
>
> if (objcg && !obj_cgroup_may_zswap(objcg))
> goto reject;
>
> Even if memory.zswap.max is > 0, if the limit is hit, shrinking the
> zswap pool will only help if we happen to writeback a page from the
> same memcg that hit its limit. Keep in mind that we will only
> writeback one page every time we observe that the limit is hit (even
> with Domenico's patch, because zswap_can_accept() should be true).
>
> On a system with a handful of memcgs,
> it seems likely that we wrongfully writeback pages from other memcgs
> because of this. Achieving nothing for this memcg, while hurting
> others. OTOH, without invoking writeback when the limit is hit, the
> memcg will just not be able to use zswap until some pages are
> faulted back in or invalidated.
>
> I am not sure which is better, just thinking out loud.
>
> Seems like this can be solved by having per-memcg LRUs, or at least
> providing an argument to the shrinker of which memcg to reclaim from.
> This would only be possible when the LRU is moved to zswap.

I totally agree! This seems like the logical next step in zswap's evolution.
I actually proposed this fix with this future development in mind - with
a per-memcg LRU, we can trigger memcg-specific shrinking in
place of this indiscriminate writeback. It seems less drastic a change
(compared to removing shrinking here now, then reintroducing it later).

Thanks for the feedback, Yosry!

>
>
> >
> > /* reclaim space if needed */
> > if (zswap_is_full()) {
> > --
> > 2.34.1
> >
> >