Re: [syzbot] [mm?] WARNING in memory_failure

From: David Hildenbrand
Date: Thu Oct 02 2025 - 03:25:53 EST


On 02.10.25 01:58, jane.chu@xxxxxxxxxx wrote:
Hi, Zi Yan,

On 9/30/2025 9:51 PM, syzbot wrote:
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
lost connection to test machine



Tested on:

commit: d8795075 mm/huge_memory: do not change split_huge_page..
git tree: https://github.com/x-y-z/linux-dev.git fix_split_page_min_order-for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=17ce96e2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=714d45b6135c308e
dashboard link: https://syzkaller.appspot.com/bug?extid=e6367ea2fdab6ed46056
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
userspace arch: arm64

Note: no patches were applied.


My hunch is that
https://github.com/x-y-z/linux-dev.git
fix_split_page_min_order-for-kernelci
alone is not enough. Perhaps on ARM64, the page cache pages of
/dev/nullb0 in the test case are probably with min_order > 0, therefore
THP split fails, as the console message show:
[ 200.378989][T18221] Memory failure: 0x124d30: recovery action for
unsplit thp: Failed

With lots of poisoned THP pages stuck in the page cache, OOM could
trigger too soon.

I think it's worth to try add the additional changes I suggested earlier -
https://lore.kernel.org/lkml/7577871f-06be-492d-b6d7-8404d7a045e0@xxxxxxxxxx/

I think that makes sense in this case. I earlier said that I don't think even splitting makes sense in this case, but as you say we can actually at least allow for reclaiming the remainder of the folio.

Even though we cannot proceed in handling the remaining large folio later on.

--
Cheers

David / dhildenb