Re: [PATCH] mm: migrate: requeue destination folio on deferred split queue
From: Zi Yan
Date: Fri Mar 06 2026 - 11:29:09 EST
On 6 Mar 2026, at 11:15, Usama Arif wrote:
> On 06/03/2026 14:46, Zi Yan wrote:
>> On 6 Mar 2026, at 9:12, Usama Arif wrote:
>>
>>> On 06/03/2026 13:49, David Hildenbrand (Arm) wrote:
>>>> On 3/6/26 14:35, Usama Arif wrote:
>>>>> During folio migration, __folio_migrate_mapping() removes the source
>>>>> folio from the deferred split queue, but the destination folio is never
>>>>> re-queued. This causes underutilized THPs to escape the shrinker after
>>>>> NUMA migration, since they silently drop off the deferred split list.
>>>>>
>>>>> Fix this by calling deferred_split_folio() on the destination folio
>>>>> after a successful migration, for large rmappable folios.
>>>>>
>>>>> Reported-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>>>>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>>>>> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxx>
>>>>> ---
>>>>> mm/migrate.c | 11 +++++++++++
>>>>> 1 file changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>> index ece77ccb2ec0..98d0a594f7b7 100644
>>>>> --- a/mm/migrate.c
>>>>> +++ b/mm/migrate.c
>>>>> @@ -1393,6 +1393,17 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>>>> if (old_page_state & PAGE_WAS_MAPPED)
>>>>> remove_migration_ptes(src, dst, 0);
>>>>>
>>>>> + /*
>>>>> + * Requeue the destination folio on the deferred split queue if
>>>>> + * the source was a large folio that was on the queue. Without
>>>>> + * this, NUMA migration causes underutilized THPs to escape
>>>>> + * the shrinker since the source is unqueued in
>>>>> + * __folio_migrate_mapping() and the destination is never
>>>>> + * re-queued.
>>>>> + */
>>>>> + if (folio_test_large(dst) && folio_test_large_rmappable(dst))
>>>>> + deferred_split_folio(dst, false);
>>>>
>>>> Doesn't that mean that you will readd any large folios, even if already
>>>> previously taken off the list after scanning?
>>>>
>>>> So I am not sure if your "if the source was a large folio that was on
>>>> the queue." comment is accurate?
>>>>
>>>
>>> Yes you are right. How about something like below? We also won't need to check
>>> for anon and non-device folios with this as we only set the the flag if it was
>>> already on deferred_split list.
>>
>> BTW, migrate_pages() tries to split partially mapped folios before migration[1],
>> so what remains in the deferred_list would be:
>>
>> 1. partially mapped but with a pin,
>> 2. fully mapped but potentially underused.
>>
>
> Yes, thats right.
>
>> I wonder if you want to do an underused scan before migration and try to split
>> underused THPs.
>
> hmm, I think we should keep THPs as is if there is no memory pressure (proactive
> or otherwise). Scanning THPs for zeros has a cost and we would also lose the benefit
> of THPs when we dont need memory.
Makes sense.
>
>> Or to avoid this additional scan, find a way of detecting
>> zero pages at page copy time and split it after migration.
>>
>
> Yeah but I think we lose the benefits of THPs after migration when we dont need
> additional memory?
Right.
>
>> Anyway, it seems that all large folios are in this deferred_list. Maybe, like
>> David suggested in his LSFMM proposal, we should scan large folios on LRU lists
>> at reclaim time instead, since there is not much difference between deferred_list
>> and LRU lists right now.
>>
>
> Yeah the THP shrinker is a very basic implementation and there are a lot of
>
>>
>> [1] https://elixir.bootlin.com/linux/v6.19.3/source/mm/migrate.c#L1840
>>
>
> Also Johannes pointed out its not great storing this information in page flags,
> we can just keep it as local variable. This is what the patch would look like:
>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..48a972f158ab 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1360,6 +1360,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> int rc;
> int old_page_state = 0;
> struct anon_vma *anon_vma = NULL;
> + bool src_deferred_split = false;
> struct list_head *prev;
>
> __migrate_folio_extract(dst, &old_page_state, &anon_vma);
> @@ -1373,6 +1374,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> goto out_unlock_both;
> }
>
> + if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> + !data_race(list_empty(&src->_deferred_list)))
> + src_deferred_split = true;
> +
> rc = move_to_new_folio(dst, src, mode);
> if (rc)
> goto out;
> @@ -1393,6 +1398,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> if (old_page_state & PAGE_WAS_MAPPED)
> remove_migration_ptes(src, dst, 0);
>
> + /*
> + * Requeue the destination folio on the deferred split queue if
> + * the source was on the queue. The source is unqueued in
> + * __folio_migrate_mapping(), so we recorded the state from
> + * before move_to_new_folio().
> + */
> + if (src_deferred_split)
> + deferred_split_folio(dst, false);
> +
> out_unlock_both:
> folio_unlock(dst);
> folio_set_owner_migrate_reason(dst, reason);
LGTM. Thanks for improving it.
Best Regards,
Yan, Zi