Re: [PATCH v3] mm/shmem, swap: fix race of truncate and swap entry split

From: Baolin Wang

Date: Wed Jan 28 2026 - 21:27:33 EST




On 1/29/26 12:52 AM, Kairui Song wrote:
On Wed, Jan 28, 2026 at 05:02:34AM +0800, Chris Mason wrote:

diff --git a/mm/shmem.c b/mm/shmem.c
--- a/mm/shmem.c
+++ b/mm/shmem.c

[ ... ]

@@ -1191,12 +1203,23 @@
folio = fbatch.folios[i];

if (xa_is_value(folio)) {
+ int order;
long swaps_freed;

if (unfalloc)
continue;
- swaps_freed = shmem_free_swap(mapping, indices[i], folio);
+ swaps_freed = shmem_free_swap(mapping, indices[i],
+ end - 1, folio);
if (!swaps_freed) {
+ /*
+ * If found a large swap entry cross the end border,
+ * skip it as the truncate_inode_partial_folio above
+ * should have at least zerod its content once.
+ */
+ order = shmem_confirm_swap(mapping, indices[i],
+ radix_to_swp_entry(folio));
+ if (order > 0 && indices[i] + (1 << order) > end)
+ continue;
/* Swap was replaced by page: retry */
index = indices[i];
break;

Could this code infinite loop if a large swap entry crosses the START
boundary instead of the END boundary?

Thanks for the review! It's really helpful :)

When shmem_free_swap() returns 0, it could be either because:
(a) The entry was replaced by a page, or
(b) The entry failed the boundary check (base < index ||
base + nr_pages - 1 > end)

The boundary check here only handles case (b) for entries crossing the
END boundary:

if (order > 0 && indices[i] + (1 << order) > end)
continue;

But what happens if the entry crosses the START boundary? If
find_get_entries() returns a large swap entry at indices[i] where
the entry's base (calculated as indices[i] & ~((1 << order) - 1)) is
less than the truncation start point, then shmem_free_swap() will
return 0 due to the "base < index" check. The code will then call
shmem_confirm_swap(), get the order, check if it crosses the END
boundary (which it doesn't), and retry with the same index:

index = indices[i];
break;

The next iteration will find the same entry again at the same index,
leading to an infinite loop. For example:

- Truncating range [18, 30]
- Large swap entry at [16, 23] (order 3, 8 pages)
- indices[i] = 18
- shmem_free_swap() sees base=16 < index=18, returns 0
- Check: 18 + 8 > 30 is false (26 <= 30)
- Retries with index=18
- Loop repeats indefinitely

I think this is a valid issue. And it's worse than that, during the `while (index < end)` loop a new large entry can land anywhere in the range, if one interaction's starting `index` points to the middle of any large entry, an infinite loop will occur: indices[0] are always equal to the `index` iteration value of that moments, shmem_free_swap will fail because the swap entry's index doesn't match indices[0], and so the `index = indices[i]; break;` keep it loop forever.

The chance seems very low though.

Should the boundary check also handle the START case, perhaps:

if (order > 0) {
pgoff_t base = indices[i] & ~((1UL << order) - 1);
if (base + (1 << order) - 1 > end || base < start)
continue;
}

This still doesn't cover the case when a new large entry somehow lands in the range during the loop.

where 'start' is preserved from before the loop?

How about following patch:

From 863f38c757ee0898b6b7f0f8c695f551a1380ce8 Mon Sep 17 00:00:00 2001
From: Kairui Song <kasong@xxxxxxxxxxx>
Date: Thu, 29 Jan 2026 00:19:23 +0800
Subject: [PATCH] mm, shmem: prevent infinite loop on truncate race

When truncating a large swap entry, shmem_free_swap() returns 0 when the
entry's index doesn't match the given index due to lookup alignment. The
failure fallback path checks if the entry crosses the end border and
aborts when it happens, so truncate won't erase an unexpected entry or
range. But one scenario was ignored.

When `index` points to the middle of a large swap entry, and the large
swap entry doesn't go across the end border, find_get_entries() will
return that large swap entry as the first item in the batch with
`indices[0]` equal to `index`. The entry's base index will be smaller
than `indices[0]`, so shmem_free_swap() will fail and return 0 due to
the "base < index" check. The code will then call shmem_confirm_swap(),
get the order, check if it crosses the END boundary (which it doesn't),
and retry with the same index.

The next iteration will find the same entry again at the same index with
same indices, leading to an infinite loop.

Fix this by retrying with a round-down index, and abort if the index is
smaller than the truncate range.

Reported-by: Chris Mason <clm@xxxxxxxx>
Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@xxxxxxxx/
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split")
Signed-off-by: Kairui Song <kasong@xxxxxxxxxxx>
---

Thanks. The fix looks good to me.
Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>

(BTW, I think we can simplify the logic by moving the boundary validation into shmem_free_swap() in the future).

mm/shmem.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index b9ddd38621a0..fe3719eb5a3c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1211,17 +1211,22 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, uoff_t lend,
swaps_freed = shmem_free_swap(mapping, indices[i],
end - 1, folio);
if (!swaps_freed) {
- /*
- * If found a large swap entry cross the end border,
- * skip it as the truncate_inode_partial_folio above
- * should have at least zerod its content once.
- */
+ pgoff_t base = indices[i];
+
order = shmem_confirm_swap(mapping, indices[i],
radix_to_swp_entry(folio));
- if (order > 0 && indices[i] + (1 << order) > end)
- continue;
- /* Swap was replaced by page: retry */
- index = indices[i];
+ /*
+ * If found a large swap entry cross the end or start
+ * border, skip it as the truncate_inode_partial_folio
+ * above should have at least zerod its content once.
+ */
+ if (order > 0) {
+ base = round_down(base, 1 << order);
+ if (base < start || base + (1 << order) > end)
+ continue;
+ }
+ /* Swap was replaced by page or extended, retry */
+ index = base;
break;
}
nr_swaps_freed += swaps_freed;