[PATCH v2 0/2] improving dynamic zswap shrinker protection scheme

From: Nhat Pham
Date: Tue Jul 30 2024 - 18:27:52 EST


v2:
* Add more details in comments, patch changelog, documentation, etc.
about the second chance scheme and its ability to modulate the
writeback rate (patch 1) (suggested by Yosry Ahmed).
* Move the referenced bit (patch 1) (suggested by Yosry Ahmed).

When experimenting with the memory-pressure based (i.e "dynamic") zswap
shrinker in production, we observed a sharp increase in the number of
swapins, which led to performance regression. We were able to trace this
regression to the following problems with the shrinker's warm pages
protection scheme:

1. The protection decays way too rapidly, and the decaying is coupled with
zswap stores, leading to anomalous patterns, in which a small batch of
zswap stores effectively erase all the protection in place for the
warmer pages in the zswap LRU.

This observation has also been corroborated upstream by Takero Funaki
(in [1]).

2. We inaccurately track the number of swapped in pages, missing the
non-pivot pages that are part of the readahead window, while counting
the pages that are found in the zswap pool.


To alleviate these two issues, this patch series improve the dynamic zswap
shrinker in the following manner:

1. Replace the protection size tracking scheme with a second chance
algorithm. This new scheme removes the need for haphazard stats
decaying, and automatically adjusts the pace of pages aging with memory
pressure, and writeback rate with pool activities: slowing down when
the pool is dominated with zswpouts, and speeding up when the pool is
dominated with stale entries.

2. Fix the tracking of the number of swapins to take into account
non-pivot pages in the readahead window.

With these two changes in place, in a kernel-building benchmark without
any cold data added, the number of swapins is reduced by 64.12%. This
translate to a 10.32% reduction in build time. We also observe a 3%
reduction in kernel CPU time.

In another benchmark, with cold data added (to gauge the new algorithm's
ability to offload cold data), the new second chance scheme outperforms
the old protection scheme by around 0.7%, and actually written back around
21% more pages to backing swap device. So the new scheme is just as good,
if not even better than the old scheme on this front as well.

[1]: https://lore.kernel.org/linux-mm/CAPpodddcGsK=0Xczfuk8usgZ47xeyf4ZjiofdT+ujiyz6V2pFQ@xxxxxxxxxxxxxx/

Nhat Pham (2):
zswap: implement a second chance algorithm for dynamic zswap shrinker
zswap: increment swapin count for non-pivot swapped in pages

include/linux/zswap.h | 16 +++---
mm/page_io.c | 11 ++++-
mm/swap_state.c | 8 +--
mm/zswap.c | 110 ++++++++++++++++++++++++------------------
4 files changed, 82 insertions(+), 63 deletions(-)


base-commit: cca1345bd26a67fc61a92ff0c6d81766c259e522
--
2.43.0