[RFC PATCH] mm: add folio in swapcache if swapin from zswap

From: chengming . zhou
Date: Fri Mar 22 2024 - 12:40:21 EST


From: Chengming Zhou <chengming.zhou@xxxxxxxxx>

There is a report of data corruption caused by double swapin, which is
only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends.

The root cause is that zswap is not like other "normal" swap backends,
it won't keep the copy of data after the first time of swapin. So if
the folio in the first time of swapin can't be installed in the pagetable
successfully and we just free it directly. Then in the second time of
swapin, we can't find anything in zswap and read wrong data from swapfile,
so this data corruption problem happened.

We can fix it by always adding the folio into swapcache if we know the
pinned swap entry can be found in zswap, so it won't get freed even though
it can't be installed successfully in the first time of swapin.

And we have to check if the swap entry is in zswap after entry pinned,
only then we can make sure the check result is stable.

Reported-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
Closes: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@xxxxxxxxxxxxxx
Signed-off-by: Chengming Zhou <chengming.zhou@xxxxxxxxx>
---
include/linux/zswap.h | 6 ++++++
mm/memory.c | 28 ++++++++++++++++++++++++----
mm/zswap.c | 10 ++++++++++
3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 2a85b941db97..180d0b1f0886 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -36,6 +36,7 @@ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
void zswap_lruvec_state_init(struct lruvec *lruvec);
void zswap_folio_swapin(struct folio *folio);
bool is_zswap_enabled(void);
+bool zswap_find(swp_entry_t swp);
#else

struct zswap_lruvec_state {};
@@ -65,6 +66,11 @@ static inline bool is_zswap_enabled(void)
return false;
}

+static inline bool zswap_find(swp_entry_t swp)
+{
+ return false;
+}
+
#endif

#endif /* _LINUX_ZSWAP_H */
diff --git a/mm/memory.c b/mm/memory.c
index 4f2caf1c3c4d..a564b2b8faca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4031,18 +4031,38 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
ret = VM_FAULT_OOM;
goto out_page;
}
+
+ /*
+ * We have to add the folio into swapcache if
+ * this pinned swap entry is found in zswap,
+ * which won't keep copy of data after swapin.
+ * Or data will just get lost if later folio
+ * can't be installed successfully in pagetable.
+ */
+ if (zswap_find(entry)) {
+ if (add_to_swap_cache(folio, entry,
+ GFP_KERNEL, &shadow)) {
+ ret = VM_FAULT_OOM;
+ goto out_page;
+ }
+ swapcache = folio;
+ need_clear_cache = false;
+ } else {
+ shadow = get_shadow_from_swap_cache(entry);
+ /* To provide entry to swap_read_folio() */
+ folio->swap = entry;
+ }
+
mem_cgroup_swapin_uncharge_swap(entry);

- shadow = get_shadow_from_swap_cache(entry);
if (shadow)
workingset_refault(folio, shadow);

folio_add_lru(folio);

- /* To provide entry to swap_read_folio() */
- folio->swap = entry;
swap_read_folio(folio, true, NULL);
- folio->private = NULL;
+ if (need_clear_cache)
+ folio->private = NULL;
}
} else {
page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
diff --git a/mm/zswap.c b/mm/zswap.c
index c4979c76d58e..84a904a788a3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1601,6 +1601,16 @@ void zswap_invalidate(swp_entry_t swp)
zswap_entry_free(entry);
}

+bool zswap_find(swp_entry_t swp)
+{
+ pgoff_t offset = swp_offset(swp);
+ struct xarray *tree = swap_zswap_tree(swp);
+ struct zswap_entry *entry;
+
+ entry = xa_find(tree, &offset, offset, XA_PRESENT);
+ return entry != NULL;
+}
+
int zswap_swapon(int type, unsigned long nr_pages)
{
struct xarray *trees, *tree;
--
2.20.1