Re: [PATCH 5/6] mm/zswap: only support zswap_exclusive_loads_enabled

From: Johannes Weiner
Date: Thu Feb 01 2024 - 13:15:09 EST


On Thu, Feb 01, 2024 at 03:49:05PM +0000, Chengming Zhou wrote:
> The !zswap_exclusive_loads_enabled mode will leave compressed copy in
> the zswap tree and lru list after the folio swapin.
>
> There are some disadvantages in this mode:
> 1. It's a waste of memory since there are two copies of data, one is
> folio, the other one is compressed data in zswap. And it's unlikely
> the compressed data is useful in the near future.
>
> 2. If that folio is dirtied, the compressed data must be not useful,
> but we don't know and don't invalidate the trashy memory in zswap.
>
> 3. It's not reclaimable from zswap shrinker since zswap_writeback_entry()
> will always return -EEXIST and terminate the shrinking process.
>
> On the other hand, the only downside of zswap_exclusive_loads_enabled
> is a little more cpu usage/latency when compression, and the same if
> the folio is removed from swapcache or dirtied.
>
> Not sure if we should accept the above disadvantages in the case of
> !zswap_exclusive_loads_enabled, so send this out for disscusion.
>
> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>

This is interesting.

First, I will say that I never liked this config option, because it's
nearly impossible for a user to answer this question. Much better to
just pick a reasonable default.

What should the default be?

Caching "swapout work" is helpful when the system is thrashing. Then
recently swapped in pages might get swapped out again very soon. It
certainly makes sense with conventional swap, because keeping a clean
copy on the disk saves IO work and doesn't cost any additional memory.

But with zswap, it's different. It saves some compression work on a
thrashing page. But the act of keeping compressed memory contributes
to a higher rate of thrashing. And that can cause IO in other places
like zswap writeback and file memory.

It would be useful to have an A/B test to confirm that not caching is
better. Can you run your test with and without keeping the cache, and
in addition to the timings also compare the deltas for pgscan_anon,
pgscan_file, workingset_refault_anon, workingset_refault_file?