Re: [syzbot] [mm?] WARNING in memory_failure
From: jane . chu
Date: Thu Oct 02 2025 - 13:48:14 EST
On 10/2/2025 6:54 AM, Zi Yan wrote:
On 2 Oct 2025, at 1:23, jane.chu@xxxxxxxxxx wrote:
On 10/1/2025 7:04 PM, Zi Yan wrote:
On 1 Oct 2025, at 20:38, Zi Yan wrote:
On 1 Oct 2025, at 19:58, jane.chu@xxxxxxxxxx wrote:
Hi, Zi Yan,
On 9/30/2025 9:51 PM, syzbot wrote:
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
lost connection to test machine
Tested on:
commit: d8795075 mm/huge_memory: do not change split_huge_page..
git tree: https://github.com/x-y-z/linux-dev.git fix_split_page_min_order-for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=17ce96e2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=714d45b6135c308e
dashboard link: https://syzkaller.appspot.com/bug?extid=e6367ea2fdab6ed46056
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
userspace arch: arm64
Note: no patches were applied.
Thank you for looking into this.
My hunch is that
https://github.com/x-y-z/linux-dev.git fix_split_page_min_order-for-kernelci
alone is not enough. Perhaps on ARM64, the page cache pages of /dev/nullb0 in
Yes, it only has the first patch, which fails a split if it cannot be
split to the intended order (order-0 in this case).
the test case are probably with min_order > 0, therefore THP split fails, as the console message show:
[ 200.378989][T18221] Memory failure: 0x124d30: recovery action for unsplit thp: Failed
With lots of poisoned THP pages stuck in the page cache, OOM could trigger too soon.
That is my understanding too. Thanks for the confirmation.
I think it's worth to try add the additional changes I suggested earlier -
https://lore.kernel.org/lkml/7577871f-06be-492d-b6d7-8404d7a045e0@xxxxxxxxxx/
So that in the madvise HWPOISON cases, large huge pages are splitted to smaller huge pages, and most of them remain usable in the page cache.
Yep, I am going to incorporate your suggestion as the second patch and make
syzbot check it again.
#syz test: https://github.com/x-y-z/linux-dev.git fix_split_page_min_order_and_opt_memory_failure-for-kernelci
There is a bug here,
if (try_to_split_thp_page(p, new_order, false) || new_order) {
res = -EHWPOISON;
kill_procs_now(p, pfn, flags, folio); <---
If try_to_split_thp_page() succeeded on min_order, 'folio' should be retaken: folio = page_folio(page) before moving on to kill_procs_now().
Thank you for pointing it out. Let me fix it and let syzbot test it again.
BTW, do you mind explaining why soft offline case does not want to split?
Like memory failure case, splitting it would make other after-split folios
available.
That's exactly what I think. Let's wait for Miaohe, not sure if he has
other concern.
thanks,
-jane
Thanks.
Best Regards,
Yan, Zi