Re: [PATCH mm-unstable 1/1] mm: fix deferred split queue races during migration
From: David Hildenbrand (Arm)
Date: Wed Apr 01 2026 - 14:55:40 EST
On 4/1/26 18:28, Usama Arif wrote:
> On Wed, 1 Apr 2026 21:10:32 +0800 Lance Yang <lance.yang@xxxxxxxxx> wrote:
>
>> From: Lance Yang <lance.yang@xxxxxxxxx>
>>
>> migrate_folio_move() records the deferred split queue state from src and
>> replays it on dst. Replaying it after remove_migration_ptes(src, dst, 0)
>> makes dst visible before it is requeued, so a concurrent rmap-removal path
>> can mark dst partially mapped and trip the WARN in deferred_split_folio().
>>
>> Move the requeue before remove_migration_ptes() so dst is back on the
>> deferred split queue before it becomes visible again.
>>
>> Because migration still holds dst locked at that point, teach
>> deferred_split_scan() to requeue a folio when folio_trylock() fails.
>> Otherwise a fully mapped underused folio can be dequeued by the shrinker
>> and silently lost from split_queue.
>>
>> Link: https://syzkaller.appspot.com/bug?extid=a7067a757858ac8eb085
>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>> Reported-by: syzbot+a7067a757858ac8eb085@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Closes: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@xxxxxxxxxx/
>> Cc: <stable@xxxxxxxxxxxxxxx>
>> Suggested-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
>> Signed-off-by: Lance Yang <lance.yang@xxxxxxxxx>
>> ---
>>
>> [ Backport note ]
>> This patch is a follow-up fix for 8a8ca142a488 ("mm: migrate: requeue
>> destination folio on deferred split queue"), which is currently only in
>> mm-stable, and should be backported together with it.
>>
>> Credit for this fix goes to David, thanks!
>>
>> mm/huge_memory.c | 12 +++++++-----
>> mm/migrate.c | 18 +++++++++---------
>> 2 files changed, 16 insertions(+), 14 deletions(-)
>>
>
>
> Thanks for the fix! And sorry for introducing the bug in
> migrate_folio_move() :)
>
> So I am happy with the migrate_folio_move() change, it makes sense.
>
> The goto next if folio is locked in deferred_split_scan() was actually
> on purpose. The reasoning was that if the folio is locked, we consider
> it as in use by someone and therefore we shouldnt split it. Eventhough
> thp_underused() does a zero-filled check, the whole point of the shrinker
> was to split THPs that are "not in use", and in my mind, locked folio
> is a folio in use. So not sure about that change..
That is a questionable assessment. It's about checking whether folios
are *underused* not, if they are used, by whoever in the system (e.g.,
migration).
Just take a look when anonymous folios are actually locked :)
So the original locked handling here is just bogus.
--
Cheers,
David