Re: [PATCH] mm: khugepaged: fix memory leak in collapse_file rollback path
From: Shardul Bankar
Date: Mon Nov 24 2025 - 10:27:52 EST
Hi David, Dev,
Thanks for the clarification, that really helped straighten things out.
On Mon, 2025-11-24 at 17:16 +0530, Dev Jain wrote:
>
> On 24/11/25 3:32 pm, David Hildenbrand (Red Hat) wrote:
> >
> > Do you mean that, if xas_create_range() failed, collapse_file()
> > will call xas_nomem() to preallocate memory?
Yes that's correct.
> > I don't immediately see how xas_create_range() would call
> > xas_nomem().
xas_create_range() indeed doesn't call xas_nomem() internally. The
control flow is:
xas_create_range(&xas)
-> xas_create()
-> may set XA_ERROR(-ENOMEM)
collapse_file() detects xas_error()
-> calls xas_nomem()
-> allocates a spare node and stores it in xas->xa_alloc
> >
> > Note that after we call xas_nomem(), we retry xas_create_range() -
> > - that previously failed to to -ENOMEM.
> >
> > So the assumption is that the xas_create_range() call would
> > consume that memory.
> >
> > I'm sure there is some corner case where it is not the case (some
> > concurrent action? not sure)
>
> Someone else can put slots in the xarray since we dropped the lock. I
> cannot prove this, but surely
> disproving this is harder : )
As Dev pointed out, after xas_nomem(), another thread may expand the
xarray while the lock is dropped. In that case, the next
xas_create_range() retry could succeed without consuming xa_alloc, so
xas_destroy() should clear the unused spare node. Calling xas_destroy()
on all exit paths therefore seems correct and safe.
> > Shouldn't we just call xas_destroy() in any case, also when
> > everything succeeded?
>
> Yeah you are right. We should probably do
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index abe54f0043c7..0794a99c807f100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1872,11 +1872,14 @@ staticintcollapse_file(struct mm_struct *mm,
> unsignedlongaddr,
> do {
> xas_lock_irq(&xas);
> xas_create_range(&xas);
> - if (!xas_error(&xas))
> + if (!xas_error(&xas)) {
> + xas_destroy(&xas);
> break;
> + }
> xas_unlock_irq(&xas);
> if (!xas_nomem(&xas, GFP_KERNEL)) {
> result = SCAN_FAIL;
> + xas_destroy(&xas);
> goto rollback;
> }
> } while (1);
I’ll prepare v2 implementing Dev’s suggestion and update the commit
message accordingly.
Thanks and Regards,
Shardul