Re: [PATCH] mm, swap: free the cluster extend table on teardown

From: David CARLIER

Date: Wed Jun 03 2026 - 16:56:46 EST

On Wed, 3 Jun 2026 at 03:42, Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Wed, Jun 3, 2026 at 6:27 AM David Carlier <devnexen@xxxxxxxxx> wrote:
> >
> > swap_cluster_free_table() frees every per-cluster side table but
> > ci->extend_table. That table is only released by
> > swap_extend_table_try_free(), which the teardown path never calls, so a
> > cluster can be freed with an extend table still attached.
> >
> > It can also linger while the cluster is live. swap_dup_entries_cluster()
> > drops the lock to allocate an extend table when a slot reaches
> > SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
> > the retry takes the normal path and leaves the table behind, all entries
> > zero; only the failure path frees it.
> >
> > Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
> > skips allocation when ci->extend_table is set, the next user of the
> > cluster inherits the stale table and its leftover counts, corrupting the
> > swap count of any slot that overflows. CONFIG_DEBUG_VM catches the
>
> There won't be a corruption, extend_table is all zero at this point,
> the leak on swapoff is real though.
>
> > dangling table in swap_cluster_assert_empty(); otherwise it is silent.
> >
> > Free it in swap_cluster_free_table(), and also on the
> > swap_dup_entries_cluster() success path to match the failure path.
> >
> > Reported-by: syzbot+deedf22929084640666f@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
> > Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> > ---
> > mm/swapfile.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 615d90867111..a69a26aec4c0 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
> > ci->zero_bitmap = NULL;
> > #endif
> >
> > + kfree(ci->extend_table);
> > + ci->extend_table = NULL;
> > +
>
> Still a bit too late to avoid the WARN? The WARN is already triggered
> at this point, swap_cluster_free_table is called after
> swap_cluster_assert_empty.
>
> > table = (struct swap_table *)rcu_access_pointer(ci->table);
> > if (!table)
> > return;
> > @@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
> > goto failed;
> > }
> > } while (++ci_off < ci_end);
> > + swap_extend_table_try_free(ci);
> > swap_cluster_unlock(ci);
> > return 0;
> > failed:
> > --
> > 2.53.0
>
> I think we have already fixed this?
> https://lore.kernel.org/all/6a1eac8e.fbc46276.3c3783.0008.GAE@xxxxxxxxxx/T/

Thanks for the review.

Agreed on all counts. 0475fde0f68d already addresses both the warning
and the swapoff leak at the allocation site, so this patch is
redundant. Please drop it.

Andrew, you're right that no cc:stable was warranted here.

Cheers.