Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered

From: David Hildenbrand (arm)

Date: Wed Feb 04 2026 - 12:17:17 EST


On 2/4/26 18:12, David Hildenbrand (arm) wrote:
On 2/4/26 13:49, 是参差 wrote:
Hi,
I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).

The warning is triggered by a syzkaller C reproducer that:
maps a file-backed region with MAP_FIXED, touches related VMAs, and then
calls madvise() with MADV_HWPOISON over a large range.
The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.

The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/ huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.

This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.

Reproducer:
C reproducer: https://pastebin.com/raw/UxennX2B
console output: https://pastebin.com/raw/wrhKRwZY
kernel config: https://pastebin.com/raw/dP93yBLn

Kernel:

HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377

  git tree: torvalds/linux

kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)

@Zi Yan, this is weird.

We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.

But how do we get a large folio in that case? folio_test_large(folio) succeeded.

I think we rules out hugetlb before in that function.


Looking into the full console output, this is an order-3 folio (fully mapped).

How do we end up with a large folio here? I am only aware of that happening when something would
allocate an order-3 compound page (not a folio) and map it into the page tables. Yes, that
is nasty and can still happen, not sure yet though whether that is really what the reproducer
triggers.

Looking again,

mapping:0000000000000000 index:0xffff88800fe2e600

At least mapping==0 could indicate a non-folio thing.

--
Cheers,

David