Re: [PATCH v2 1/2] zswap: implement a second chance algorithm for dynamic zswap shrinker

From: Yosry Ahmed
Date: Mon Aug 05 2024 - 19:58:51 EST


[..]
> > > @@ -1167,25 +1189,6 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
> > > return SHRINK_STOP;
> > > }
> > >
> > > - nr_protected =
> > > - atomic_long_read(&lruvec->zswap_lruvec_state.nr_zswap_protected);
> > > - lru_size = list_lru_shrink_count(&zswap_list_lru, sc);
> > > -
> > > - /*
> > > - * Abort if we are shrinking into the protected region.
> > > - *
> > > - * This short-circuiting is necessary because if we have too many multiple
> > > - * concurrent reclaimers getting the freeable zswap object counts at the
> > > - * same time (before any of them made reasonable progress), the total
> > > - * number of reclaimed objects might be more than the number of unprotected
> > > - * objects (i.e the reclaimers will reclaim into the protected area of the
> > > - * zswap LRU).
> > > - */
> > > - if (nr_protected >= lru_size - sc->nr_to_scan) {
> > > - sc->nr_scanned = 0;
> > > - return SHRINK_STOP;
> > > - }
> > > -
> >
> > Do we need a similar mechanism to protect against concurrent shrinkers
> > quickly consuming nr_swapins?
>
> Not for nr_swapins consumption per se, and the original reason why I
> included this (racy) check is just so that concurrent reclaimers do
> not disrespect the protection scheme. We had no guarantee that we
> wouldn't just reclaim into the protected region (well even with this
> racy check technically). With the second chance scheme, a "protected"
> page (i.e with its referenced bit set) would not be reclaimed right
> away - a shrinker encountering it would have to "age" it first (by
> unsetting the referenced bit), so the intended protection is enforced.
>
> That said, I do believe we need a mechanism to limit the concurrency
> here. The amount of pages aged/reclaimed should scale (linearly?
> proportionally?) with the reclaim pressure, i.e more reclaimers ==
> more pages reclaimed/aged, so the current behavior is desired.
> However, at some point, if we have more shrinkers than there are work
> assigned to each of them, we might be unnecessarily wasting resources
> (and potentially building up the nr_deferred counter that we discussed
> in v1 of the patch series). Additionally, we might be overshrinking in
> a very short amount of time, without letting the system have the
> chance to react and provide feedback (through swapins/refaults) to the
> memory reclaimers.
>
> But let's do this as a follow-up work :) It seems orthogonal to what
> we have here.

Agreed, as long as the data shows we don't regress by removing this
part I am fine with doing this as a follow-up work.

>
> > > - * Subtract the lru size by an estimate of the number of pages
> > > - * that should be protected.
> > > + * Subtract the lru size by the number of pages that are recently swapped
> >
> > nit: I don't think "subtract by" is correct, it's usually "subtract
> > from". So maybe "Subtract the number of pages that are recently
> > swapped in from the lru size"? Also, should we remain consistent about
> > mentioning that these are disk swapins throughout all the comments to
> > keep things clear?
>
> Yeah I should be clearer here - it should be swapped in from disk, or
> more generally (accurately?) swapped in from the backing swap device
> (but the latter can change once we decoupled swap from zswap). Or
> maybe swapped in from the secondary tier?
>
> Let's just not overthink and go with swapped in from disk for now :)

Agreed :)

I will take a look at the new version soon, thanks for working on this.