Re: [PATCH v2] mm: thp: deny THP for files on anonymous inodes

From: Lance Yang

Date: Sat Feb 21 2026 - 23:10:25 EST




On 2026/2/16 06:48, Ackerley Tng wrote:
Lance Yang <lance.yang@xxxxxxxxx> writes:

On 2026/2/14 08:15, Deepanshu Kartikey wrote:
file_thp_enabled() incorrectly allows THP for files on anonymous inodes
(e.g. guest_memfd and secretmem). These files are created via
alloc_file_pseudo(), which does not call get_write_access() and leaves
inode->i_writecount at 0. Combined with S_ISREG(inode->i_mode) being
true, they appear as read-only regular files when
CONFIG_READ_ONLY_THP_FOR_FS is enabled, making them eligible for THP
collapse.

Anonymous inodes can never pass the inode_is_open_for_write() check
since their i_writecount is never incremented through the normal VFS
open path. The right thing to do is to exclude them from THP eligibility
altogether, since CONFIG_READ_ONLY_THP_FOR_FS was designed for real
filesystem files (e.g. shared libraries), not for pseudo-filesystem
inodes.

For guest_memfd, this allows khugepaged and MADV_COLLAPSE to create
large folios in the page cache via the collapse path, but the
guest_memfd fault handler does not support large folios. This triggers
WARN_ON_ONCE(folio_test_large(folio)) in kvm_gmem_fault_user_mapping().

For secretmem, collapse_file() tries to copy page contents through the
direct map, but secretmem pages are removed from the direct map. This
can result in a kernel crash:

Good catch, thanks!

For secretmem, file_thp_enabled() can incorrectly return true
(i_writecount=0, S_ISREG=1), so the mapping becomes eligible for file
THP collapse ...

However, if any folio is dirty, collapse bails out early with
SCAN_PAGE_DIRTY_OR_WRITEBACK, as secretmem doesn't support normal
writeback, IIUC.


Yup! In the reproducers [1] I had to try to avoid setting the dirty flag
on the pages.

[1] https://lore.kernel.org/linux-mm/CAEvNRgHegcz3ro35ixkDw39ES8=U6rs6S7iP0gkR9enr7HoGtA@xxxxxxxxxxxxxx


BUG: unable to handle page fault for address: ffff88810284d000
RIP: 0010:memcpy_orig+0x16/0x130
Call Trace:
collapse_file
hpage_collapse_scan_file
madvise_collapse

Secretmem is not affected by the crash on upstream as the memory failure
recovery handles the failed copy gracefully, but it still triggers
confusing false memory failure reports:

Memory failure: 0x106d96f: recovery action for clean unevictable
LRU page: Recovered

Right. On my setup, that would hit SCAN_COPY_MC in
hpage_collapse_scan_file()
rather than a hard crash.


Deepanshu, were you able to trigger a hard crash on some earlier kernel?
I only saw this false memory failure log.

On a setup where memory failure recovery works, we can trigger a panic by
disabling recovery:

echo 0 > /proc/sys/vm/memory_failure_recovery

Then we would hit the following panic:

[ 117.608411] Kernel panic - not syncing: Memory failure on page 1024d6
[ 117.609490] CPU: 4 UID: 0 PID: 168 Comm: kworker/4:1 Not tainted 6.19.0 #83 PREEMPT(full)
[ 117.610817] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[ 117.612121] Workqueue: events memory_failure_work_func
[ 117.612978] Call Trace:
[ 117.613401] <TASK>
[ 117.613766] dump_stack_lvl+0x60/0x90
[ 117.614382] dump_stack+0x14/0x1a
[ 117.614940] vpanic+0x1a6/0x470
[ 117.615476] panic+0xc0/0xc0
[ 117.615967] ? __pfx_panic+0x10/0x10
[ 117.616571] ? update_cfs_rq_load_avg+0x5f/0x5a0
[ 117.617336] ? dequeue_entities+0x250/0x1e30
[ 117.618043] memory_failure.cold+0x2d/0x2d
[ 117.618725] ? __pfx_memory_failure+0x10/0x10
[ 117.619451] ? __raw_spin_lock_irqsave+0x8d/0xf0
[ 117.620215] ? __switch_to+0x3e9/0xb60
[ 117.620841] memory_failure_work_func+0x150/0x200
[ 117.621621] process_one_work+0x63d/0xf50
[ 117.622292] worker_thread+0x517/0xd90
[ 117.622915] ? __pfx_worker_thread+0x10/0x10
[ 117.623629] kthread+0x369/0x460
[ 117.624169] ? __pfx_kthread+0x10/0x10
[ 117.624796] ret_from_fork+0x33a/0x660
[ 117.625422] ? __pfx_ret_from_fork+0x10/0x10
[ 117.626126] ? switch_fpu+0x19/0x1f0
[ 117.626728] ? __switch_to+0x3e9/0xb60
[ 117.627354] ? __pfx_kthread+0x10/0x10
[ 117.627978] ret_from_fork_asm+0x1a/0x30
[ 117.628633] </TASK>
[ 117.629316] Kernel Offset: disabled
[ 117.629902] ---[ end Kernel panic - not syncing: Memory failure on page 1024d6 ]---