Re: [PATCH] mm/migrate_device: fix folio refcount leak on folio_split_unmapped failure

From: Zi Yan

Date: Thu Mar 05 2026 - 11:39:57 EST


On 5 Mar 2026, at 11:36, Usama Arif wrote:

> On 05/03/2026 12:09, Mika Penttilä wrote:
>> On 3/5/26 13:44, Usama Arif wrote:
>>
>>>
>>> On 05/03/2026 06:09, Mika Penttilä wrote:
>>>> Hi!
>>>>
>>>> On 3/5/26 01:28, Usama Arif wrote:
>>>>
>>>>> On 04/03/2026 22:09, Balbir Singh wrote:
>>>>>> On 3/5/26 08:54, Zi Yan wrote:
>>>>>>> On 4 Mar 2026, at 16:48, Balbir Singh wrote:
>>>>>>>
>>>>>>>> On 3/5/26 02:17, Zi Yan wrote:
>>>>>>>>> On 4 Mar 2026, at 7:01, Usama Arif wrote:
>>>>>>>>>
>>>>>>>>>> From: Usama Arif <usama.arif@xxxxxxxxx>
>>>>>>>>>>
>>>>>>>>>> migrate_vma_split_unmapped_folio() takes an extra reference via
>>>>>>>>>> folio_get() before calling folio_split_unmapped(). On success, the
>>>>>>>>>> split consumes this reference: __folio_freeze_and_split_unmapped()
>>>>>>>>>> expects the +1 in its folio_ref_freeze() check, and distributes it
>>>>>>>>>> across the resulting sub-folios via folio_ref_unfreeze(...+1), which
>>>>>>>>>> are later balanced by folio_put() calls in __migrate_device_finalize().
>>>>>>>>>>
>>>>>>>>>> If folio_split_unmapped() fails (e.g., unexpected pinning returns
>>>>>>>>>> -EAGAIN), the function returns without calling folio_put(). The extra
>>>>>>>>>> reference is never released.
>>>>>>>>>>
>>>>>>>>>> Add the missing folio_put() on the error path.
>>>>>>>>>>
>>>>>>>>>> Fixes: 4265d67e405a4 ("mm/migrate_device: add THP splitting during migration")
>>>>>>>>>> Closes: https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@xxxxxxxxxxxxxx/
>>>>>>>>>> Reported-by: Nico Pache <npache@xxxxxxxxxx>
>>>>>>>>>> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>> mm/migrate_device.c | 4 +++-
>>>>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>>>>>>>> index 0a8b31939640f..351ecd9065d13 100644
>>>>>>>>>> --- a/mm/migrate_device.c
>>>>>>>>>> +++ b/mm/migrate_device.c
>>>>>>>>>> @@ -917,8 +917,10 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>>>>>>>>>> folio_get(folio);
>>>>>>>>>> split_huge_pmd_address(migrate->vma, addr, true);
>>>>>>>>>> ret = folio_split_unmapped(folio, 0);
>>>>>>>>>> - if (ret)
>>>>>>>>>> + if (ret) {
>>>>>>>>>> + folio_put(folio);
>>>>>>>>>> return ret;
>>>>>>>>>> + }
>>>>>>>>>> migrate->src[idx] &= ~MIGRATE_PFN_COMPOUND;
>>>>>>>>>> flags = migrate->src[idx] & ((1UL << MIGRATE_PFN_SHIFT) - 1);
>>>>>>>>>> pfn = migrate->src[idx] >> MIGRATE_PFN_SHIFT;
>>>>>>>>>> --
>>>>>>>>>> 2.47.3
>>>>>>>>> Add Balbir, who wrote the code, to comment on this.
>>>>>>>>>
>>>>>>>> Thanks Zi!
>>>>>>>>
>>>>>>>> Just wondering if there is a reproducer for the issue and how the fix was tested?
>>>>>>>> I expect migrate_vma_finalize() to be called for folios, even when split failed and
>>>>>>>> drop the lock.
>>>>>>> Does migrate_vma_finalize() do folio_put() for failed-to-split folios?
>>>>>>> If so, how does it distinguish between split folios and failed-to-split folios?
>>>>>>> By comparing source and destination folio orders?
>>>>>>>
>>>>>> We reset the MIGRATE_PFN_MIGRATE flag for failing to migrate pfns. We do a folio_put
>>>>>> on the src in finalize, if it is split then on all the split folios as well.
>>>>>>
>>>>>>> What we see from migrate_vma_split_unmapped_folio() is that
>>>>>>> it adds a refcount for all input folios, but only drops a refcount
>>>>>>> for the split folio. Isn’t it cause failed-to-split folios to have
>>>>>>> additional refcount?
>>>>>>>
>>>>> Hello!
>>>>>
>>>>> Thanks for reviewing everyone. So its very difficult to create a reproducer I think
>>>>> the extra reference would need to appear after migrate_device_unmap() but before
>>>>> folio_split_unmapped() in migrate_vma_pages()? That's hard to trigger reliably from
>>>>> userspace.
>>>>>
>>>>> The fix came about when Nico indicated there might be an issue if split_huge_pmd_address
>>>>> fails in my patch [1].
>>>>>
>>>>> Below is my understanding of how refcounting is working over here step by step. I
>>>>> might very well be wrong on this, and the refcounting is a bit all over the place
>>>>> and I might miss a reference change somewhere so would really appreciate if someone
>>>>> can confirm this!
>>>>>
>>>>>
>>>>> 1. migrate_vma_collect_huge_pmd():
>>>>> a) folio_get(folio) -> +1 (collect reference)
>>>>> 2. migrate_device_unmap():
>>>>> a) folio_isolate_lru() -> +1 (isolation reference)
>>>>> b) folio_put() -> -1 (drops the collect reference)
>>>>>
>>>>>
>>>>> Without this patch fix:
>>>>>
>>>>> 3. migrate_vma_split_unmapped_folio():
>>>>> a) folio_get(folio) -> +1 (split reference)
>>>>> b) folio_split_unmapped() -> fails
>>>>> c) Returns error — without folio_put() which is the fix
>>>>> 4. Caller in migrate_vma_pages(): clears MIGRATE_PFN_MIGRATE | MIGRATE_PFN_COMPOUND
>>>>> 5. __migrate_device_finalize(): sees !(src_pfns[i] & MIGRATE_PFN_MIGRATE), restores the folio:
>>>>> a) remove_migration_ptes(src, src) — re-establishes user PTEs
>>>>> b) folio_unlock(src)
>>>>> c) folio_put(src) -> -1 (drops the isolation reference)
>>>>>
>>>>> The split reference in 3.a is never released and the folio has a permanently elevated refcount.
>>>>> Unless I missed a folio_put somewhere for the refcount increase in folio_isolate_lru() (2.b)?
>>>>>
>>>>> Please let me know if this makes sense!
>>>>>
>>>>> [1] https://lore.kernel.org/all/CAA1CXcDyqPPwf_-W7B+PFQtL8HdoJGCEqVsVxq7DhOUB=L4PQA@xxxxxxxxxxxxxx/
>>>>>
>>>>>> Thanks! Yes, the patch makes sense
>>>>>>
>>>>>> Acked-by: Balbir Singh <balbirs@xxxxxxxxxx>
>>>>>>
>>>>>> Balbir
>>>> I remember stumbling on this while ago also. The folio_get() in migrate_vma_split_unmapped_folio()
>>>> is balanced with put_page() in __split_huge_pmd_locked() (freeze = true), can't fail for device pages.
>>>> Folios at this point are unmapped but have 1 refcount from "collecting".
>>>> After folio_split_unmapped() the refcount(s) is still 1.
>>>>
>>>> So it seems the code is good as is? A comment though would be good for the extra folio_get..
>>>>
>>> hmm I dont think the put_page() in __split_huge_pmd_locked() is there to balance the folio_get() in
>>> migrate_vma_split_unmapped_folio(). There are other points where split_huge_pmd_locked() is called
>>> with freeze = true [1] and they don't get a reference before calling split_huge_pmd.
>>>
>>> I think the folio_put() in __split_huge_pmd_locked() freeze = true case is there as migration
>>> entries are being installed?
>>>
>>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/rmap.c#L2334
>>>
>>>
>> Yes normally you want to drop the reference when installing migration entries but in this context
>> you have already done the collecting for the THP folio and you want to balance with the folio_get()
>> the put_page() to keep the refs unchanged. Is that right Balbir?
>>
>> --Mika
>>
>
> Hi Mika,
>
> You are right, This patch is wrong. I tried the below diff to force folio_split_unmapped to return
> -EAGAIN. I ran tools/testing/selftests/mm/hmm-tests -r hmm.hmm_device_private.migrate_anon_huge_err
> to trigger the path for folio_split_unmapped.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8e2746ea74adf..6df33b4990a13 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4140,6 +4140,8 @@ int folio_split_unmapped(struct folio *folio, unsigned int new_order)
> if (folio_expected_ref_count(folio) != folio_ref_count(folio) - 1)
> return -EAGAIN;
>
> + return -EAGAIN;
> +
> local_irq_disable();
> ret = __folio_freeze_and_split_unmapped(folio, new_order, &folio->page, NULL,
> NULL, false, NULL, SPLIT_TYPE_UNIFORM,
>
>
>
> I inserted a lot of traces to keep track of refcounts [1]. Without this patch, I get
> ....
> hmm-tests-129 [000] ..... 1.476233: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffc536e2c4100000 refcount=0 AFTER error NO folio_put
> hmm-tests-129 [000] ..... 1.476234: __migrate_device_pages: PAGES: split FAILED folio=ffc536e2c4100000 refcount=0
> hmm-tests-129 [000] ..... 1.476236: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 dst=ffc536e2c4100000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=0 migrate=0 BEFORE remove_migration_ptes
> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=1 mapcount=0 AFTER remove_migration_ptes
> hmm-tests-129 [000] ..... 1.476237: __migrate_device_finalize: FINALIZE[0]: src=ffc536e2c4100000 refcount=0 AFTER folio_put(src)
>
> i.e. refcount = 512, which is correct as split_huge_pmd_address was successful. Full output is
> at [2].
>
> With this patch, I get:
>
> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_FILEPAGES val:-511 Comm:bash Pid:63
> BUG: Bad rss-counter state mm:00000000cfe88d5e type:MM_ANONPAGES val:511 Comm:bash Pid:63
> ...
> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: SPLIT_UNMAPPED: folio=ffed210c840f0000 refcount=1 AFTER error folio_put FIX PRESENT
> hmm-tests-129 [000] ..... 1.468315: __migrate_device_pages: PAGES: split FAILED folio=ffed210c840f0000 refcount=1
> hmm-tests-129 [000] ..... 1.468318: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 dst=ffed210c840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=513 mapcount=512 AFTER remove_migration_ptes
> hmm-tests-129 [000] ..... 1.468357: __migrate_device_finalize: FINALIZE[0]: src=ffed210c840f0000 refcount=512 AFTER folio_put(src)
>
> refcount=0 means the folio would be freed which is not correct. The full output is at [3].
>
> Thank you for clearing this up!

Thank you for doing the investigation. Can you send a patch to add a comment
in migrate_vma_split_unmapped_folio() about this to avoid the confusion
in the future?

>
>
> [1] https://gist.github.com/uarif1/65e1e816af7aa0ae38dd6ec64d62a993
> [2] https://gist.github.com/uarif1/79ea9500667daa4e2ef09cb5d308f041
> [3] https://gist.github.com/uarif1/8a35a6c65ba8b3a1c1dfe72dc30e821d


Best Regards,
Yan, Zi