Re: [PATCH v4 2/5] mm/zswap: Factor writeback loop out of shrink_worker()
From: Hao Jia
Date: Fri Jun 26 2026 - 05:35:41 EST
On 2026/6/26 01:59, Yosry Ahmed wrote:
static long zswap_shrink_one(struct mem_cgroup *memcg,
struct zswap_shrink_state *s)
{
long shrunk;
shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
if (shrunk == -ENOENT)
return 0;
s->attempts++;
if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
s->stop = true;
Do we need 'stop' or can we just return a value here to indicate that
we should stop (e.g. -EBUSY)?
Perhaps we could return -EAGAIN instead of -EBUSY? This would align with
the semantics of the memory.reclaim interface, which returns -EAGAIN
when it reclaims fewer bytes than requested.
Hmm but -EAGAIN tells the caller to try again, while here -EAGAIN
tells the caller *not* to try again because we exhausted all retries?
Okay, let's go with -EBUSY.
I think splitting the shrink/retry logic over 2 functions makes it
more difficult to follow, so yeah I think fold
zswap_shrink_no_candidate() into zswap_shrink_one(). Then the callers
only need to iterate memcgs (depending on the context) and call
zswap_shrink_one() for each of them.
So, something like this?
Yeah, something like this :)
/* Track progress of a memcg-tree writeback walk. */
struct zswap_shrink_state {
int attempts;
While at it, I think "attempts" is really the number of scans, right?
Should we rename it? Maybe "scans" or similar?
int failures;
};
/*
* Take one step of a memcg-tree writeback walk driven by the caller's
* iterator, and fold the result into @s, the retry bookkeeping shared
* across steps. @memcg is the iterator's current memcg, or NULL once
* it has wrapped around after a full pass over the tree.
*
* The function returns -EAGAIN to signal the caller to abort the walk
* after encountering the following conditions MAX_RECLAIM_RETRIES times:
* - No writeback-candidate memcgs were found in a memcg tree walk.
* - Shrinking a writeback-candidate memcg failed.
Orthogonal to this patch, but I wonder if this can be simplified. I
wonder if these two conditions can be replaced with "shrinking a memcg
that has zswap entries failed". The "no writeback-candidate memcgs in
the tree" case seems like we should abort right away instead of
retrying?
Nhat, WDYT?
Perhaps something like the following is what you had in mind? I've drafted the implementation below to make it easier for Nhat to compare with the previous behavior.
*
* Return: The number of compressed bytes written back (>= 0), or -EAGAIN
* once the retry budget is exhausted and the caller should abort the walk.
*/
static long zswap_shrink_one(struct mem_cgroup *memcg,
Nit: zswap_shrink_one_memcg()
BTW, the existing writeback logic has been broken for a while now when
memcg is disabled. I think we constantly hit the !memcg case and run
out of retries. Not sure if your patch changes this in any way, or if
you want to fix that while you're at it :)
Yes, I'd be happy to do that. However, would it be better to submit a separate fix patch or combine it with this one?
struct zswap_shrink_state *s)
{
long shrunk;
/*
* If the iterator has completed a full pass, update the shrink state
* and check whether we should keep going.
*/
if (!memcg) {
/*
* Continue shrinking without incrementing failures if we found
* candidate memcgs in the last tree walk.
*/
if (!s->attempts && ++s->failures == MAX_RECLAIM_RETRIES)
return -EAGAIN;
s->attempts = 0;
return 0;
}
shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
/*
* There are no writeback-candidate pages in the memcg. This is not an
* issue as long as we can find another memcg with pages in zswap. Skip
* this without incrementing attempts and failures.
*/
if (shrunk == -ENOENT)
return 0;
s->attempts++;
if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
return -EAGAIN;
return shrunk;
}
static void shrink_worker(struct work_struct *w)
{
struct zswap_shrink_state s = {};
unsigned long thr;
/* Reclaim down to the accept threshold */
thr = zswap_accept_thr_pages();
while (zswap_total_pages() > thr) {
struct mem_cgroup *memcg;
long ret;
cond_resched();
memcg = zswap_iter_global();
Do we still need this helper? Or should we just keep the memcg
iteration open-coded?
Done.
ret = zswap_shrink_one(memcg, &s);
/* drop the extra reference taken by zswap_iter_global() */
mem_cgroup_put(memcg);
if (ret == -EAGAIN)
break;
}
}
/* Track progress of a memcg-tree writeback walk. */
struct zswap_shrink_state {
int scans;
int failures;
};
/*
* Take one step of a memcg-tree writeback walk driven by the caller's
* iterator, and fold the result into @s, the retry bookkeeping shared
* across steps. @memcg is the iterator's current memcg, or NULL once
* it has wrapped around after a full pass over the tree.
*
* The function returns -EBUSY to signal the caller to abort the walk when
* either of the following occurs:
* - A full pass over the tree found no writeback-candidate memcg.
* - Shrinking a writeback-candidate memcg failed MAX_RECLAIM_RETRIES times.
*
* When memory cgroup is disabled, the iterator always yields NULL. All
* zswap entries then live on the root list_lru, so NULL is treated as the
* root memcg and shrunk directly rather than as a completed tree pass.
*
* Return: The number of compressed bytes written back (>= 0), or -EBUSY
* when the caller should abort the walk.
*/
static long zswap_shrink_one_memcg(struct mem_cgroup *memcg,
struct zswap_shrink_state *s)
{
bool disabled = mem_cgroup_disabled();
long shrunk;
/*
* If the iterator has completed a full pass, update the shrink state
* and check whether we should keep going.
* With memcg disabled the iterator always yields NULL, so fall through
* and shrink the root memcg directly instead.
*/
if (!memcg && !disabled) {
/*
* Abort if no writeback-candidate memcgs in the last tree walk.
* Otherwise reset the scans count and continue.
*/
if (!s->scans)
return -EBUSY;
s->scans = 0;
return 0;
}
shrunk = shrink_memcg(memcg, NR_ZSWAP_WB_BATCH);
/*
* There are no writeback-candidate pages in the memcg. With memcg
* enabled this is not an issue as long as we can find another memcg
* with pages in zswap, so skip without counting it as a candidate.
* With memcg disabled the root LRU is the only target, so we should
* abort if it has no writeback-candidate pages.
*/
if (shrunk == -ENOENT)
return disabled ? -EBUSY : 0;
s->scans++;
if (shrunk <= 0 && ++s->failures == MAX_RECLAIM_RETRIES)
return -EBUSY;
return shrunk;
}
static void shrink_worker(struct work_struct *w)
{
struct zswap_shrink_state s = {};
unsigned long thr;
/* Reclaim down to the accept threshold */
thr = zswap_accept_thr_pages();
/*
* Global reclaim will select cgroup in a round-robin fashion from all
* online memcgs, but memcgs that have no pages in zswap and
* writeback-disabled memcgs (memory.zswap.writeback=0) are not
* candidates for shrinking.
*
* We save iteration cursor memcg into zswap_next_shrink,
* which can be modified by the offline memcg cleaner
* zswap_memcg_offline_cleanup().
*
* Since the offline cleaner is called only once, we cannot leave an
* offline memcg reference in zswap_next_shrink.
* We can rely on the cleaner only if we get online memcg under lock.
*
* If we get an offline memcg, we cannot determine if the cleaner has
* already been called or will be called later. We must put back the
* reference before returning from this function. Otherwise, the
* offline memcg left in zswap_next_shrink will hold the reference
* until the next run of shrink_worker().
*/
while (zswap_total_pages() > thr) {
struct mem_cgroup *memcg;
long ret;
cond_resched();
/*
* Start shrinking from the next memcg after zswap_next_shrink.
* When the offline cleaner has already advanced the cursor,
* advancing the cursor here overlooks one memcg, but this
* should be negligibly rare.
*
* If we get an online memcg, keep the extra reference in case
* the original one obtained by mem_cgroup_iter() is dropped by
* zswap_memcg_offline_cleanup() while we are shrinking the
* memcg.
*/
spin_lock(&zswap_shrink_lock);
do {
memcg = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
zswap_next_shrink = memcg;
} while (memcg && !mem_cgroup_tryget_online(memcg));
spin_unlock(&zswap_shrink_lock);
ret = zswap_shrink_one_memcg(memcg, &s);
/* drop the extra reference taken above */
mem_cgroup_put(memcg);
if (ret == -EBUSY)
break;
}
}
Thanks,
Hao