Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
From: Zi Yan
Date: Wed Apr 01 2026 - 18:56:32 EST
On 1 Apr 2026, at 15:21, Zi Yan wrote:
> On 1 Apr 2026, at 9:10, Lance Yang wrote:
>
>> From: Lance Yang <lance.yang@xxxxxxxxx>
>>
>> migrate_folio_move() records the deferred split queue state from src and
>> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
>> makes dst visible before it is requeued, so a concurrent rmap-removal path
>> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>>
>> Move the requeue before remove_migration_ptes() so dst is back on the
>> deferred split queue before it becomes visible again.
>>
>> Because migration still holds dst locked at that point, teach
>> deferred_split_scan() to requeue a folio when folio_trylock() fails.
>> Otherwise a fully mapped underused folio can be dequeued by the shrinker
>> and silently lost from split_queue.
>>
>> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>> Reported-by: syzbot+a7067a757858ac8eb085@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@xxxxxxxxxx/
>> Cc: <stable@xxxxxxxxxxxxxxx>
>> Suggested-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
>> Signed-off-by: Lance Yang <lance.yang@xxxxxxxxx>
>> ---
>>
>> [ Backport note ]
>> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
>> destination folio on deferred split queue"), which is currently only in
>> mm-stable, and should be backported together with it.
>>
>> Credit for this fix goes to David, thanks!
>>
>> mm/huge_memory.c | 12 +++++++-----
>> mm/migrate.c | 18 +++++++++---------
>> 2 files changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index ff9a42abd1b6..ac6d823e351f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>> goto next;
>> }
>> if (!folio_trylock(folio))
>> - goto next;
>> + goto requeue;
>> if (!split_folio(folio)) {
>> did_split = true;
>> if (underused)
>> @@ -4569,11 +4569,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>> next:
>> if (did_split || !folio_test_partially_mapped(folio))
>> continue;
>> +requeue:
>> /*
>> - * Only add back to the queue if folio is partially mapped.
>> - * If thp_underused returns false, or if split_folio fails
>> - * in the case it was underused, then consider it used and
>> - * don't add it back to split_queue.
>> + * Add back partially mapped folios, or underused folios
>> + * that we could not lock this round. If thp_underused()
>> + * returns false, or if split_folio() succeeds, or if
>> + * split_folio() fails in the case it was underused, then
>> + * consider it used and don't add it back to split_queue.
>> */
>
> Should the sentence
> “If thp_underused() returns false, or if split_folio() succeeds, or if
> split_folio() fails in the case it was underused, then
> consider it used and don't add it back to split_queue.”
> be moved to below label next?
>
> Since “thp_underused() returns false” is describing “if (!underused) goto next”,
> “split_folio() succeeds” is describing “did_split == true in the if”,
> “split_folio() fails in the case it was underused” is describing
> “did_split == false and !folio_test_partially_mapped(folio) in the if”.
>
> The first sentence matches the goto requeue for folio_trylock().
Hi Andrew,
Can you apply the fixup below to move the comment? Lance told me he
would be away for a while, so he could not send a fixup to move
the comment.
Thanks.