Re: [RFC PATCH 0/8] Introducte Reserved THP
From: Matthew Wilcox
Date: Sun Jun 28 2026 - 23:46:39 EST
On Sat, Jun 27, 2026 at 03:21:48PM +0800, Qi Zheng wrote:
> This RFC patchset introduces a new feature called "Reserved THP", and I'd like
> to open up a discussion on how to use this as a stepping stone toward unifying
> HugeTLB and THP (Transparent Huge Page).
I'm really happy you're looking into this. I'm not terribly familiar
with the page allocator code, so I don't have any comments on the
patches themselves, but I do have a few on your approach.
> Therefore, we are wondering if we can introduce "reserved THP", which is THP
> that can be reserved. It can be consumed through methods like madvise(), while
> normal memory allocation cannot consume it. This can achieve an effect similar
> to hugetlb. And because it is THP, it can relatively easily support swap
> features, which perfectly solves the above problem.
As I understand it, hugetlbfs reserves on mmap().
> This RFC wants to discuss another implementation:
>
> 1. Introduce a new migratetype: MIGRATE_RESERVED_THP.
> 2. Introduce two new hugetlb-like kernel boot parameters: `thp_reserved_size`
> and `thp_reserved_nr`. When set, the required memory is marked as
> MIGRATE_RESERVED_THP and put back into the buddy allocator.
> 3. Introduce a new madvise parameter: `MADV_RESERVED_THP`. Pages marked as
> MIGRATE_RESERVED_THP can only be consumed via `madvise(MADV_RESERVED_THP)`.
> Other normal memory allocations cannot consume MIGRATE_RESERVED_THP memory.
>
> This can achieve a reservation effect similar to HugeTLB and guarantee
> allocation success.
I think this is an interesting approach. I don't think it should be too
hard to migrate existing hugetlbfs users to it.
> 3. Future Plans
> ===============
>
> 3.1 Enhance swap-out and swap-in for large folios
> -------------------------------------------------
>
> Currently, For swap-out, THP_SWAP is supported, but it only tries to swap out
> the THP folio as a whole. It is still possible to be forced to split in some
> situations (e.g., fragmented swap space, memory.swap.max limits, etc). For
> swap-in, it is almost impossible to directly swap in the THP folio as a whole.
>
> But for reserved THP, splitting is not allowed. We need to ensure that it
> remains a whole huge page during swap-out and swap-in, to achieve a function
> similar to hugetlb swap.
So I think the current restriction is something that needs to be fixed
anyway. It doesn't actually make sense that a folio must be written out
contiguously; filesystems do not have this restriction. I understand
why swap currently has this limitation, but I'm hoping it gets removed
at some point. I'm not sure if the people working on swap right now
intend to fix this. They're already on the cc, so I hope they chime in.
> 3.2 Integrate reserved THP into the common reclaim path
> -------------------------------------------------------
>
> Once swap-in and swap-out of huge pages can be supported without splitting,
> reserved THP can be integrated into the common reclaim path as a normal LRU
> folio for memory reclamation. This fills the gap of the hugetlb swap function.
Hm. Then what does "reserved THP" mean if they can be swapped out?
> 3.4 Use reserved THP as a backend for hugetlbfs
> -----------------------------------------------
>
> This would allow existing hugetlb users or applications to seamlessly switch to
> reserved THP.
If this is the end goal, then I think introducing new command line
options is probably the wrong approach right now. Instead, "reserved
THPs" should be allocated from the same pool as hugetlb reserve. That
way we're not jerking sysadmins around.
> 3.5 Add 1GB page support to reserved THP
> ----------------------------------------
>
> Historically, there have been several attempts to add 1GB huge page support to
> THP:
>
> 1. https://lore.kernel.org/linux-mm/20260202005451.774496-1-usamaarif642@xxxxxxxxx/
> 2. https://lore.kernel.org/linux-mm/20210224223536.803765-1-zi.yan@xxxxxxxx/
>
> Adding 1GB huge page support for reserved THP would be relatively simpler
> compared to regular THP.
Well. Maybe? What happens if we mmap() 16GiB,
madvise(USE_RESERVED_THPS) and then munmap() the first 4KiB of it?
> 3.6 Remove Hugetlb
> ------------------
>
> Once reserved THP can completely replace the existing functions of hugetlb, we
> can gradually remove Hugetlb, leaving only one huge page management system in
> the kernel.
We also need mshare to land ... but yes, eventually removing hugetlbfs
is my hope.