[PATCH v2 0/6] mm/zswap: optimize zswap lru list

From: Chengming Zhou
Date: Sat Feb 03 2024 - 22:06:26 EST


Changes in v2:
- Add comment above zswap_invalidate() to mention that large folio
swap slot is not covered for now, per Yosry.
- Add comment about locking behaviour of LRU_STOP, per Yosry.
- Add the theory details and supportive testing results on why we
choose the exclusive load as the default for zswap, per Johannes.
- Collect tags.
- Link to v1: https://lore.kernel.org/r/20240201-b4-zswap-invalidate-entry-v1-0-56ed496b6e55@xxxxxxxxxxxxx

Hi all,

This series is motivated when observe the zswap lru list shrinking,
noted there are some unexpected cases in zswap_writeback_entry().

bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}'

There are some -ENOMEM because when the swap entry is freed to
per-cpu swap pool, it doesn't invalidate/drop zswap entry. Then
the shrinker encounter these trashy zswap entries, it can't be
reclaimed and return -ENOMEM.

So moves the invalidation ahead to when swap entry freed to the
per-cpu swap pool, since there is no any benefit to leave trashy
zswap entries on the zswap tree and lru list.

Another case is -EEXIST, which is seen more in the case of
!zswap_exclusive_loads_enabled, in which case the swapin folio
will leave compressed copy on the tree and lru list. And it
can't be reclaimed until the folio is removed from swapcache.

Changing to zswap_exclusive_loads_enabled mode will invalidate
when folio swapin, which has its own drawback if that folio
is still clean in swapcache and swapout again, we need to
compress it again. Please see the commit for details on why
we choose exclusive load as the default for zswap.

Another optimization for -EEXIST is that we add LRU_STOP to
support terminating the shrinking process to avoid evicting
warmer region.

Testing using kernel build in tmpfs, one 50GB swapfile and
zswap shrinker_enabled, with memory.max set to 2GB.

mm-unstable zswap-optimize
real 63.90s 63.25s
user 1064.05s 1063.40s
sys 292.32s 270.94s

The main optimization is in sys cpu, about 7% improvement.

Thanks for review and comments!

Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
---
Chengming Zhou (6):
mm/zswap: add more comments in shrink_memcg_cb()
mm/zswap: invalidate zswap entry when swap entry free
mm/zswap: stop lru list shrinking when encounter warm region
mm/zswap: remove duplicate_entry debug value
mm/zswap: only support zswap_exclusive_loads_enabled
mm/zswap: zswap entry doesn't need refcount anymore

include/linux/list_lru.h | 2 +
include/linux/zswap.h | 4 +-
mm/Kconfig | 16 ------
mm/list_lru.c | 3 ++
mm/swap_slots.c | 3 ++
mm/swapfile.c | 1 -
mm/zswap.c | 136 ++++++++++++++++-------------------------------
7 files changed, 56 insertions(+), 109 deletions(-)
---
base-commit: 3a92c45e4ba694381c46994f3fde0d8544a2088b
change-id: 20240201-b4-zswap-invalidate-entry-b77dea670325

Best regards,
--
Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>