Re: [syzbot] [mm?] WARNING in deferred_split_folio

From: David Hildenbrand (Arm)

Date: Wed Apr 01 2026 - 07:04:40 EST


On 4/1/26 12:53, Lance Yang wrote:
>
> +Cc Deepanshu
>
> On Wed, Apr 01, 2026 at 12:16:43PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/1/26 10:59, Lance Yang wrote:
>>>
>>> >from another sharer can then remove some of those mappings and reach
>>>
>>> Perhaps the WARN is simply too strict there :)
>>>
>>> Migration already holds the folio lock on dst, while the competing
>>> rmap-removal path runs under the page-table lock. So once
>>> remove_migration_ptes(src, dst, 0) makes dst visible again, this race
>>> looks hard to avoid.
>>>
>>> So maybe the simplest fix is just to drop the WARN in the
>>> !partially_mapped path:
>>>
>>> ---8<---
>>> Subject: [PATCH 1/1] mm/thp: avoid false warning in deferred_split_folio()
>>>
>>> From: Lance Yang <lance.yang@xxxxxxxxx>
>>>
>>> migrate_folio_move() snapshots src_partially_mapped from src before
>>> migration and later requeues dst after remove_migration_ptes(src, dst, 0).
>>>
>>> Once dst is visible again, a competing rmap-removal path can legally set
>>> PG_partially_mapped before the migration path reaches
>>> deferred_split_folio(dst, src_partially_mapped).
>>>
>>> Migration already holds the folio lock on dst, while the competing
>>> rmap-removal path runs under the page-table lock. So once
>>> remove_migration_ptes(src, dst, 0) makes dst visible again, this race
>>> looks hard to avoid.
>>>
>>> So just drop the WARN in the !partially_mapped path and preserve an
>>> already-set PG_partially_mapped bit.
>>>
>>> Link: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@xxxxxxxxxx/
>>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>>> Reported-by: syzbot+a7067a757858ac8eb085@xxxxxxxxxxxxxxxxxxxxxxxxx
>>> Signed-off-by: Lance Yang <lance.yang@xxxxxxxxx>
>>> ---
>>> mm/huge_memory.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 745eb3d0d4a7..8ea8e293dc7c 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -4433,9 +4433,6 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
>>> mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
>>>
>>> }
>>> - } else {
>>> - /* partially mapped folios cannot become non-partially mapped */
>>> - VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
>>> }
>>
>> Can't we simply move the setting before restoring migration ptes?
>
> Afraid not, it closes the remove_migration_ptes() ->
> deferred_split_folio() race, but opens a new one with the shrinker, IIUC
>
> Once dst is on the deferred split queue, deferred_split_scan() can
> pick it up immediately. The shrinker unconditionally dequeues every
> folio it visits:
>
> list_del_init(&folio->_deferred_list); /* always */
>
> Then for a non-partially-mapped folio, if folio_trylock() fails
> (dst is still locked by migration), it falls through to:
>
> next:
> if (did_split || !folio_test_partially_mapped(folio))
> continue; /* not requeued, dst silently lost */
>
> so it is *not* requeued.

How is that different to the shrinker just trying to lock the folio before we
unlock it and failing? The race already exists?

To sort out that race a trylock must not result in the folio getting
discarded.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ff9a42abd1b6..521989517cd1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
goto next;
}
if (!folio_trylock(folio))
- goto next;
+ goto requeue:
if (!split_folio(folio)) {
did_split = true;
if (underused)
@@ -4569,6 +4569,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
next:
if (did_split || !folio_test_partially_mapped(folio))
continue;
+requeue:
/*
* Only add back to the queue if folio is partially mapped.
* If thp_underused returns false, or if split_folio fails


--
Cheers,

David