Re: [PATCH] mm: thp: Deny THP for guest_memfd and secretmem in file_thp_enabled()

From: David Hildenbrand (Arm)

Date: Mon Feb 09 2026 - 15:13:53 EST


On 2/9/26 20:45, David Hildenbrand (Arm) wrote:
On 2/9/26 19:22, Ackerley Tng wrote:
Deepanshu Kartikey <kartikey406@xxxxxxxxx> writes:

On Mon, Feb 9, 2026 at 4:12 PM David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:

Hi David,

Thanks for the suggestion. I looked into the get_write_access() path.

Both guest_memfd and secretmem use alloc_file_pseudo() which skips
calling get_write_access(), so i_writecount stays 0. That's why
file_thp_enabled() sees them as read-only files.

We could add get_write_access() after alloc_file_pseudo() in both, but
I think that would be a hack rather than a proper fix:

- i_writecount has a specific semantic: tracking how many fds have the
file open for writing. We'd be bumping it just to influence
file_thp_enabled() behavior.


I agree re-using i_writecount feels odd since it is abusing the idea of
being written to. I might have misunderstood the full context of
i_writecount though.

i_writecount means "the file is open with write access" IIUC. So one can mmap(PROT_WRITE) it etc.

And that's kind of the thing: the virtual file is open with write access. That's why I am still wondering whether mimicking that is actually the right fix.


- It doesn't express the actual intent. The real issue is that
CONFIG_READ_ONLY_THP_FOR_FS was never meant for pseudo-filesystem
backed files.

I think the AS_NO_READ_ONLY_THP_FOR_FS flag you suggested earlier is
the cleaner approach. It is explicit, has no side effects, and is easy
to rip out when CONFIG_READ_ONLY_THP_FOR_FS goes away.


I was considering other address space flags and I think the best might
be to make khugepaged respect AS_FOLIO_ORDER_MAX and have somewhere in
__vma_thp_allowable_orders() check the maximum allowed order for the
address space.

The thing is that CONFIG_READ_ONLY_THP_FOR_FS explicitly bypasses these folio order checks. Changing it would degrade filesystems that do not support large folios yet. IOW, it would be similar to ripping out CONFIG_READ_ONLY_THP_FOR_FS. Which we plan for one of the next releases :)


khugepaged is about consolidating memory to huge pages, so if the
address space doesn't allow a larger folio order, then khugepaged should
not operate on that memory.

The other options are

+ AS_UNEVICTABLE: Sounds like khugepaged should respect AS_UNEVICTABLE,
   but IIUC evictability is more closely related to swapping and
   khugepaged might operate on swappable memory?
Right, it does not really make sense

+ AS_INACCESSIBLE: This is only used by guest_memfd, and is mostly used
   to block migration. khugepaged kind of migrates the memory contents
   too, but someday we want guest_memfd to support migration, and at that
   time we would still want to block khugepaged, so I don't think we want
   to reuse a flag that couples khugepaged to migration.

It could be used at least for the time being and to fix the issue.

mapping_inaccessible(mapping) indeed looks like the easiest fix, given that
shmem "somehow" works, lol.

BUT, something just occurred to me.

We added the mc-handling in

commit 98c76c9f1ef7599b39bfd4bd99b8a760d4a8cd3b
Author: Jiaqi Yan <jiaqiyan@xxxxxxxxxx>
Date: Wed Mar 29 08:11:19 2023 -0700

mm/khugepaged: recover from poisoned anonymous memory
..

So I assume kernels before that would crash when collapsing?

Looking at 5.15.199, it does not contain 98c76c9f1e [1].

So I suspect we need a fix+stable backport.

Who volunteers to try a secretmem reproducer on a stable kernel? :)


The following is a bit nasty as well but should do the trick until we rip
out the CONFIG_READ_ONLY_THP_FOR_FS stuff.


diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 03886d4ccecc..4ac1cb36b861 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -40,6 +40,7 @@
#include <linux/pgalloc.h>
#include <linux/pgalloc_tag.h>
#include <linux/pagewalk.h>
+#include <linux/secretmem.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -94,6 +95,10 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
inode = file_inode(vma->vm_file);
+ if (mapping_inaccessible(inode->i_mapping) ||
+ secretmem_mapping(inode->i_mapping))
+ return false;
+
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}


[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/mm/khugepaged.c?h=v5.15.199

--
Cheers,

David