Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted

From: David Hildenbrand
Date: Thu Jun 06 2024 - 16:22:57 EST


On 06.06.24 20:48, Yosry Ahmed wrote:
With ongoing work to support large folio swapin, it is important to make
sure we do not pass large folios to zswap_load() without implementing
proper support.

For example, if a swapin fault observes that contiguous PTEs are
pointing to contiguous swap entries and tries to swap them in as a large
folio, swap_read_folio() will pass in a large folio to zswap_load(), but
zswap_load() will only effectively load the first page in the folio. If
the first page is not in zswap, the folio will be read from disk, even
though other pages may be in zswap.

In both cases, this will lead to silent data corruption.

Proper large folio swapin support needs to go into zswap before zswap
can be enabled in a system that supports large folio swapin.

Looking at callers of swap_read_folio(), it seems like they are either
allocated from __read_swap_cache_async() or do_swap_page() in the
SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so we
are fine for now.

Add a VM_BUG_ON() in zswap_load() to make sure that we detect changes in
the order of those allocations without proper handling of zswap.

Alternatively, swap_read_folio() (or its callers) can be updated to have
a fallback mechanism that splits large folios or reads subpages
separately. Similar logic may be needed anyway in case part of a large
folio is already in the swapcache and the rest of it is swapped out.

Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
---

Sorry for the long CC list, I just found myself repeatedly looking at
new series that add swap support for mTHPs / large folios, making sure
they do not break with zswap or make incorrect assumptions. This debug
check should give us some peace of mind. Hopefully this patch will also
raise awareness among people who are working on this.

---
mm/zswap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/zswap.c b/mm/zswap.c
index b9b35ef86d9be..6007252429bb2 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1577,6 +1577,9 @@ bool zswap_load(struct folio *folio)
if (!entry)
return false;
+ /* Zswap loads do not handle large folio swapins correctly yet */
+ VM_BUG_ON(folio_test_large(folio));
+

There is no way we could have a WARN_ON_ONCE() and recover, right?

--
Cheers,

David / dhildenb