Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure
From: Matthew Brost
Date: Wed Mar 04 2026 - 17:07:04 EST
On Wed, Mar 04, 2026 at 04:54:01PM -0500, Zi Yan wrote:
> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>
> > On 3/5/26 02:17, Zi Yan wrote:
> >> On 4 Mar 2026, at 7:01, Usama Arif wrote:
> >>
> >>> From: Usama Arif <usama.arif@xxxxxxxxx>
> >>>
> >>> migrate_vma_split_unmapped_folio() takes an extra reference via
> >>> folio_get() before calling folio_split_unmapped(). On success, the
> >>> split consumes this reference: __folio_freeze_and_split_unmapped()
> >>> expects the +1 in its folio_ref_freeze() check, and distributes it
> >>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
> >>> are later balanced by folio_put() calls in __migrate_device_finalize().
> >>>
> >>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
> >>> -EAGAIN), the function returns without calling folio_put(). The extra
> >>> reference is never released.
> >>>
> >>> Add the missing folio_put() on the error path.
> >>>
> >>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
> >>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@xxxxxxxxxxxxxx/
> >>> Reported-by: Nico Pache <npache@xxxxxxxxxx>
> >>> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxx>
> >>> ---
> >>> mm/migrate_device.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> >>> index 0a8b31939640f..351ecd9065d13 100644
> >>> --- a/mm/migrate_device.c
> >>> +++ b/mm/migrate_device.c
> >>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> >>> folio_get(folio);
> >>> split_huge_pmd_address(migrate->vma, addr, true);
> >>> ret = folio_split_unmapped(folio, 0);
> >>> - if (ret)
> >>> + if (ret) {
> >>> + folio_put(folio);
> >>> return ret;
> >>> + }
> >>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
> >>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
> >>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
> >>> --
> >>> 2.47.3
> >>
> >> Add Balbir, who wrote the code, to comment on this.
> >>
> >
> > Thanks Zi!
> >
> > Just wondering if there is a reproducer for the issue and how the fix was tested?
> > I expect migrate_vma_finalize() to be called for folios, even when split failed and
> > drop the lock.
>
> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
> If so, how does it distinguish between split folios and failed-to-split folios?
> By comparing source and destination folio orders?
>
> What we see from migrate_vma_split_unmapped_folio() is that
> it adds a refcount for all input folios, but only drops a refcount
> for the split folio. Isn’t it cause failed-to-split folios to have
> additional refcount?
I wonder if I’ve actually seen this bug. I’ve occasionally seen CPU page
faults hang forever spinning, which could be caused by the page’s
refcount accidentally being increased here. It’s quite difficult and
random to reproduce, so I don’t have a real analysis of what’s happening
in this case.
Matt
>
>
> Best Regards,
> Yan, Zi