Re: [syzbot] [mm?] WARNING in deferred_split_folio

From: Lance Yang

Date: Wed Apr 01 2026 - 07:36:11 EST



On Wed, Apr 01, 2026 at 01:00:13PM +0200, David Hildenbrand (Arm) wrote:
>On 4/1/26 12:53, Lance Yang wrote:
>>
>> +Cc Deepanshu
>>
>> On Wed, Apr 01, 2026 at 12:16:43PM +0200, David Hildenbrand (Arm) wrote:
>>> On 4/1/26 10:59, Lance Yang wrote:
>>>>
>>>> >from another sharer can then remove some of those mappings and reach
>>>>
>>>> Perhaps the WARN is simply too strict there :)
>>>>
>>>> Migration already holds the folio lock on dst, while the competing
>>>> rmap-removal path runs under the page-table lock. So once
>>>> remove_migration_ptes(src, dst, 0) makes dst visible again, this race
>>>> looks hard to avoid.
>>>>
>>>> So maybe the simplest fix is just to drop the WARN in the
>>>> !partially_mapped path:
>>>>
>>>> ---8<---
>>>> Subject: [PATCH 1/1] mm/thp: avoid false warning in deferred_split_folio()
>>>>
>>>> From: Lance Yang <lance.yang@xxxxxxxxx>
>>>>
>>>> migrate_folio_move() snapshots src_partially_mapped from src before
>>>> migration and later requeues dst after remove_migration_ptes(src, dst, 0).
>>>>
>>>> Once dst is visible again, a competing rmap-removal path can legally set
>>>> PG_partially_mapped before the migration path reaches
>>>> deferred_split_folio(dst, src_partially_mapped).
>>>>
>>>> Migration already holds the folio lock on dst, while the competing
>>>> rmap-removal path runs under the page-table lock. So once
>>>> remove_migration_ptes(src, dst, 0) makes dst visible again, this race
>>>> looks hard to avoid.
>>>>
>>>> So just drop the WARN in the !partially_mapped path and preserve an
>>>> already-set PG_partially_mapped bit.
>>>>
>>>> Link: https://lore.kernel.org/linux-mm/69ccb65b.050a0220.183828.003a.GAE@xxxxxxxxxx/
>>>> Fixes: 8a8ca142a488 ("mm: migrate: requeue destination folio on deferred split queue")
>>>> Reported-by: syzbot+a7067a757858ac8eb085@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>> Signed-off-by: Lance Yang <lance.yang@xxxxxxxxx>
>>>> ---
>>>> mm/huge_memory.c | 3 ---
>>>> 1 file changed, 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 745eb3d0d4a7..8ea8e293dc7c 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -4433,9 +4433,6 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
>>>> mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
>>>>
>>>> }
>>>> - } else {
>>>> - /* partially mapped folios cannot become non-partially mapped */
>>>> - VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
>>>> }
>>>
>>> Can't we simply move the setting before restoring migration ptes?
>>
>> Afraid not, it closes the remove_migration_ptes() ->
>> deferred_split_folio() race, but opens a new one with the shrinker, IIUC
>>
>> Once dst is on the deferred split queue, deferred_split_scan() can
>> pick it up immediately. The shrinker unconditionally dequeues every
>> folio it visits:
>>
>> list_del_init(&folio->_deferred_list); /* always */
>>
>> Then for a non-partially-mapped folio, if folio_trylock() fails
>> (dst is still locked by migration), it falls through to:
>>
>> next:
>> if (did_split || !folio_test_partially_mapped(folio))
>> continue; /* not requeued, dst silently lost */
>>
>> so it is *not* requeued.
>
>How is that different to the shrinker just trying to lock the folio before we
>unlock it and failing? The race already exists?

Ouch, you're right, I was wrong - the trylock drop is a pre-existing
issue, not caused by the reorder ;)

>
>To sort out that race a trylock must not result in the folio getting
>discarded.

Nice, LGTM!

Given that the "trylock -> drop" behavior seems to exist already today,
do you think it's worth fixing that together with the reorder?

>diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>index ff9a42abd1b6..521989517cd1 100644
>--- a/mm/huge_memory.c
>+++ b/mm/huge_memory.c
>@@ -4558,7 +4558,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> goto next;
> }
> if (!folio_trylock(folio))
>- goto next;
>+ goto requeue:
> if (!split_folio(folio)) {
> did_split = true;
> if (underused)
>@@ -4569,6 +4569,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
> next:
> if (did_split || !folio_test_partially_mapped(folio))
> continue;
>+requeue:
> /*
> * Only add back to the queue if folio is partially mapped.
> * If thp_underused returns false, or if split_folio fails
>
>