Re: [RFC PATCH 0/2] remove SWAP_MAP_SHMEM

From: Baolin Wang
Date: Mon Sep 23 2024 - 23:25:23 EST




On 2024/9/24 10:15, Yosry Ahmed wrote:
On Mon, Sep 23, 2024 at 6:55 PM Baolin Wang
<baolin.wang@xxxxxxxxxxxxxxxxx> wrote:



On 2024/9/24 07:11, Nhat Pham wrote:
The SWAP_MAP_SHMEM state was originally introduced in the commit
aaa468653b4a ("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a
swap entry belongs to shmem during swapoff.

However, swapoff has since been rewritten drastically in the commit
b56a2d8af914 ("mm: rid swapoff of quadratic complexity"). Now
having swap count == SWAP_MAP_SHMEM value is basically the same as having
swap count == 1, and swap_shmem_alloc() behaves analogously to
swap_duplicate()

This RFC proposes the removal of this state and the associated helper to
simplify the state machine (both mentally and code-wise). We will also
have an extra state/special value that can be repurposed (for swap entries
that never gets re-duplicated).

Another motivation (albeit a bit premature at the moment) is the new swap
abstraction I am currently working on, that would allow for swap/zswap
decoupling, swapoff optimization, etc. The fewer states and swap API
functions there are, the simpler the conversion will be.

I am sending this series first as an RFC, just in case I missed something
or misunderstood this state, or if someone has a swap optimization in mind
for shmem that would require this special state.

The idea makes sense to me. I did a quick test with shmem mTHP, and
encountered the following warning which is triggered by
'VM_WARN_ON(usage == 1 && nr > 1)' in __swap_duplicate().

Apparently __swap_duplicate() does not currently handle increasing the
swap count for multiple swap entries by 1 (i.e. usage == 1) because it
does not handle rolling back count increases when
swap_count_continued() fails.

I guess this voids my Reviewed-by until we sort this out. Technically
swap_count_continued() won't ever be called for shmem because we only
ever increment the count by 1, but there is no way to know this in
__swap_duplicate() without SWAP_HAS_SHMEM.

Agreed. An easy solution might be to add a new boolean parameter to indicate whether the SHMEM swap entry count is increasing?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index cebc244ee60f..21f1eec2c30a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3607,7 +3607,7 @@ void si_swapinfo(struct sysinfo *val)
* - swap-cache reference is requested but the entry is not used. -> ENOENT
* - swap-mapped reference requested but needs continued swap count. -> ENOMEM
*/
-static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
+static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr, bool shmem)
{
struct swap_info_struct *si;
struct swap_cluster_info *ci;
@@ -3620,7 +3620,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)

offset = swp_offset(entry);
VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
- VM_WARN_ON(usage == 1 && nr > 1);
+ VM_WARN_ON(usage == 1 && nr > 1 && !shmem);
ci = lock_cluster_or_swap_info(si, offset);

err = 0;
@@ -3661,7 +3661,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
has_cache = SWAP_HAS_CACHE;
else if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX)
count += usage;
- else if (swap_count_continued(si, offset + i, count))
+ else if (!shmem && swap_count_continued(si, offset + i, count))
count = COUNT_CONTINUED;
else {
/*