Re: [PATCH] mm, swap: free the cluster extend table on teardown

From: Kairui Song

Date: Tue Jun 02 2026 - 22:43:13 EST

On Wed, Jun 3, 2026 at 6:27 AM David Carlier <devnexen@xxxxxxxxx> wrote:
>
> swap_cluster_free_table() frees every per-cluster side table but
> ci->extend_table. That table is only released by
> swap_extend_table_try_free(), which the teardown path never calls, so a
> cluster can be freed with an extend table still attached.
>
> It can also linger while the cluster is live. swap_dup_entries_cluster()
> drops the lock to allocate an extend table when a slot reaches
> SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
> the retry takes the normal path and leaves the table behind, all entries
> zero; only the failure path frees it.
>
> Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
> skips allocation when ci->extend_table is set, the next user of the
> cluster inherits the stale table and its leftover counts, corrupting the
> swap count of any slot that overflows. CONFIG_DEBUG_VM catches the

There won't be a corruption, extend_table is all zero at this point,
the leak on swapoff is real though.

> dangling table in swap_cluster_assert_empty(); otherwise it is silent.
>
> Free it in swap_cluster_free_table(), and also on the
> swap_dup_entries_cluster() success path to match the failure path.
>
> Reported-by: syzbot+deedf22929084640666f@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
> Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: David Carlier <devnexen@xxxxxxxxx>
> ---
> mm/swapfile.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 615d90867111..a69a26aec4c0 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
> ci->zero_bitmap = NULL;
> #endif
>
> + kfree(ci->extend_table);
> + ci->extend_table = NULL;
> +

Still a bit too late to avoid the WARN? The WARN is already triggered
at this point, swap_cluster_free_table is called after
swap_cluster_assert_empty.

> table = (struct swap_table *)rcu_access_pointer(ci->table);
> if (!table)
> return;
> @@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
> goto failed;
> }
> } while (++ci_off < ci_end);
> + swap_extend_table_try_free(ci);
> swap_cluster_unlock(ci);
> return 0;
> failed:
> --
> 2.53.0

I think we have already fixed this?
https://lore.kernel.org/all/6a1eac8e.fbc46276.3c3783.0008.GAE@xxxxxxxxxx/T/