Re: [PATCH] mm: khugepaged: free xarray nodes left behind by failed collapse_file()
From: Zi Yan
Date: Tue Jun 16 2026 - 11:13:40 EST
On 16 Jun 2026, at 10:54, Rik van Riel wrote:
> collapse_file() calls xas_create_range() up front to populate the
> xarray with nodes spanning the entire collapse range, including empty
> slots for the holes (the nr_none entries).
>
> On the success path those nodes are either consumed by the multi-index
> store of new_folio or freed by the nr_none retry-entry dance. But every
> error path that branches straight to the rollback label -- e.g.
> SCAN_TRUNCATED / SCAN_PAGE_LOCK / SCAN_FAIL detected in the main scan
> loop (which then sets nr_none = 0 and jumps to rollback), or SCAN_COPY_MC
> during the copy -- does neither. The empty nodes are left dangling in
> mapping->i_pages and are leaked once the inode is finally evicted. This
> is exactly the "THP collapse_file() failed" case that clear_inode()
> documents and deliberately tolerates without warning.
>
> syzkaller reproduces it trivially with MADV_COLLAPSE on a sparse shmem
> mapping (collapse aborts with SCAN_TRUNCATED because the range is empty),
> and also via slab fault injection, which forces xas_create_range() down
> the xas_nomem() path before the same abort. kmemleak then reports the
> 576-byte struct xa_node objects allocated in xas_alloc()/xas_nomem().
>
> The leaked objects are struct xa_node (576 bytes each), left dangling in
> mapping->i_pages and reclaimed only when the inode is finally evicted.
> Nodes leak when a collapse takes one of the rollback paths (SCAN_TRUNCATED
> / SCAN_PAGE_LOCK / SCAN_FAIL / SCAN_COPY_MC), and only for the empty hole
> slots (the nr_none entries); they are not leaked on the success path (the
> nodes are consumed by the multi-index store and the nr_none dance), and
> slots still holding folios are never touched.
>
> Prune the now-empty nodes on the rollback path. A node is only freed
> once its slot count reaches zero, and storing NULL into an
> already-empty slot is a no-op, so briefly store an XA_RETRY_ENTRY into
> each empty slot and immediately clear it: the clear drops the count
> back to zero and lets xas_store() -> xas_delete_node() free the node
> and its now-empty ancestors. Slots still holding the original folios
> are left untouched.
IIRC, there was a report[1] from Jinjiang (cc'd) about this leak.
[1] https://lore.kernel.org/all/86834731-02ba-43ea-9def-8b8ca156ec4a@xxxxxxxxxx/
>
> Fixes: 77da9389b9d5 ("mm: Convert collapse_shmem to XArray")
> Assisted-by: Claude:claude-opus-4-8 syzkaller
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: Zi Yan <ziy@xxxxxxxxxx>
> Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> Cc: Liam R. Howlett <liam@xxxxxxxxxxxxx>
> Cc: Nico Pache <npache@xxxxxxxxxx>
> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> Cc: Dev Jain <dev.jain@xxxxxxx>
> Cc: Barry Song <baohua@xxxxxxxxxx>
> Cc: Lance Yang <lance.yang@xxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
> mm/khugepaged.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b8452dbdb043..d11a4c9610e1 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2273,6 +2273,39 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>
> rollback:
> /* Something went wrong: roll back page cache changes */
> +
> + /*
> + * xas_create_range() above populated the xarray with nodes spanning
> + * the whole collapse range, including empty slots for the holes
> + * (nr_none entries). On the success path these nodes are consumed by
> + * the multi-index store of new_folio, and the nr_none handling further
> + * up frees the ones covering the holes; but the error paths that branch
> + * straight here do neither. Prune the now-empty nodes explicitly,
> + * otherwise they are leaked until the mapping is torn down -- one of
> + * the two cases called out in the comment in clear_inode().
> + *
> + * A node can only be deleted once its slot count drops to zero, so
> + * briefly store an XA_RETRY_ENTRY into each empty slot and then clear
> + * it again: clearing the retry entry drops the count back to zero and
> + * lets xas_store() -> xas_delete_node() free the node. Slots that
> + * still hold the original folios are left untouched.
> + */
> + xas_lock_irq(&xas);
> + xas_set_order(&xas, start, 0);
> + for (index = start; index < end; index++) {
> + if (!xas_next(&xas)) {
> + xas_store(&xas, XA_RETRY_ENTRY);
> + if (xas_error(&xas))
> + break;
> + }
> + }
> + xas_set_order(&xas, start, 0);
> + for (index = start; index < end; index++) {
> + if (xas_next(&xas) == XA_RETRY_ENTRY)
> + xas_store(&xas, NULL);
> + }
> + xas_unlock_irq(&xas);
> +
> if (nr_none) {
> xas_lock_irq(&xas);
> mapping->nrpages -= nr_none;
> --
> 2.53.0-Meta
--
Best Regards,
Yan, Zi