Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted

From: David Hildenbrand
Date: Fri Jun 07 2024 - 15:04:02 EST


On 07.06.24 20:58, Yosry Ahmed wrote:
On Fri, Jun 7, 2024 at 11:52 AM David Hildenbrand <david@xxxxxxxxxx> wrote:

I have no strong opinion on this one, but likely a VM_WARN_ON would also
be sufficient to find such issues early during testing. No need to crash
the machine.

I thought VM_BUG_ON() was less frowned-upon than BUG_ON(), but after
some digging I found your patches to checkpatch and Linus clearly
stating that it isn't.

:) yes.

VM_BUG_ON is not particularly helpful IMHO. If you want something to be
found early during testing, VM_WARN_ON is good enough.

Ever since Fedora stopped enabling CONFIG_DEBUG_VM, VM_* friends are
primarily reported during early/development testing only. But maybe some
distro out there still sets it.


How about something like the following (untested), it is the minimal
recovery we can do but should work for a lot of cases, and does
nothing beyond a warning if we can swapin the large folio from disk:

diff --git a/mm/page_io.c b/mm/page_io.c
index f1a9cfab6e748..8f441dd8e109f 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct
swap_iocb **plug)
delayacct_swapin_start();

if (zswap_load(folio)) {
- folio_mark_uptodate(folio);
folio_unlock(folio);
} else if (data_race(sis->flags & SWP_FS_OPS)) {
swap_read_folio_fs(folio, plug);
diff --git a/mm/zswap.c b/mm/zswap.c
index 6007252429bb2..cc04db6bb217e 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio)

VM_WARN_ON_ONCE(!folio_test_locked(folio));

+ /*
+ * Large folios should not be swapped in while zswap is being used, as
+ * they are not properly handled.
+ *
+ * If any of the subpages are in zswap, reading from disk would result
+ * in data corruption, so return true without marking the folio uptodate
+ * so that an IO error is emitted (e.g. do_swap_page() will sigfault).
+ *
+ * Otherwise, return false and read the folio from disk.
+ */
+ if (WARN_ON_ONCE(folio_test_large(folio))) {
+ if (xa_find(tree, &offset, offset +
folio_nr_pages(folio) - 1, 0))
+ return true;
+ return false;
+ }
+
/*
* When reading into the swapcache, invalidate our entry. The
* swapcache can be the authoritative owner of the page and
@@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio)
zswap_entry_free(entry);
folio_mark_dirty(folio);
}
-
+ folio_mark_uptodate(folio);
return true;
}

One problem is that even if zswap was never enabled, the warning will
be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable or
static key if zswap was "ever" enabled.

We should use WARN_ON_ONCE() only for things that cannot happen. So if
there are cases where this could be triggered today, it would be
problematic -- especially if it can be triggered from unprivileged user
space. But if we're concerned of other code messing up our invariant in
the future (e.g., enabling large folios without taking proper care about
zswap etc), we're good to add it.

Right now I can't see any paths allocating large folios for swapin, so
I think it cannot happen. Once someone tries adding it, the warning
will fire if CONFIG_ZSWAP is used, even if zswap is disabled.

At this point we will have several options:
- Make large folios swapin depend on !CONFIG_ZSWAP for now.
- Keep track if zswap was ever enabled and make the warning
conditional on it. We should also always fallback to order-0 if zswap
was ever enabled.
- Properly handle large folio swapin with zswap.

Does this sound reasonable to you?

Yes, probably easiest is 1) -> 3) or if we don't want to glue it to config options 2) -> 3).

--
Cheers,

David / dhildenb