Re: [RFC PATCH 0/8] Introducte Reserved THP
From: Barry Song
Date: Tue Jun 30 2026 - 20:25:11 EST
On Wed, Jul 1, 2026 at 7:34 AM Zi Yan <ziy@xxxxxxxxxx> wrote:
>
> On Tue Jun 30, 2026 at 6:59 PM EDT, Barry Song wrote:
> > On Mon, Jun 29, 2026 at 8:20 PM David Hildenbrand (Arm)
> > <david@xxxxxxxxxx> wrote:
> > [...]
> >> >
> >> > 2. Implementation
> >> > =================
> >> >
> >> > In 2024, Yu Zhao proposed a similar idea:
> >> >
> >> > Link: https://lore.kernel.org/all/20240229183436.4110845-2-yuzhao@xxxxxxxxxx/
> >> >
> >> > The idea was to introduce two virt zones: ZONE_NOSPLIT and ZONE_NOMERGE to
> >> > guarantee the allocation success rate of THP, achieving an effect similar to
> >> > reservation. However, it seems there was no further progress, perhaps because of
> >> > reluctance to introduce more virt zones like ZONE_MOVABLE.
> >> >
> >> > This RFC wants to discuss another implementation:
> >> >
> >> > 1. Introduce a new migratetype: MIGRATE_RESERVED_THP.
> >> > 2. Introduce two new hugetlb-like kernel boot parameters: `thp_reserved_size`
> >> > and `thp_reserved_nr`. When set, the required memory is marked as
> >> > MIGRATE_RESERVED_THP and put back into the buddy allocator.
> >>
> >> I'm all for some mechanism to make runtime allocation of large chunks of memory
> >> easier, by adding a pool from where multiple consumers (THP, guest_memfd,
> >> hugetlb, whatever) can allocate memory.
> >>
> >> Call me very skeptical of getting the page allocator involved like this. (I hate it)
> >
> > One thing we've been thinking about for a while is whether we can
> > introduce something at the pageblock level to let memory "remember"
> > which allocation order is preferred within that pageblock.
> >
> > For example, if we ever allocate an order-0 page from pageblock 100,
> > that pageblock would later prefer order-0 allocations. Similarly, if
> > we allocate a large folio from pageblock 200, we would avoid using
> > pageblock 200 for order-0 allocations as long as there is still
> > memory available in pageblock 100 for order-0.
> >
> > Since order-0 allocations are often the main source of fragmentation,
> > if we already have both pagecache and anonymous large folios, we may
> > care more about containing or quarantining order-0 allocations in
> > certain areas, rather than trying to maintain a large-folio pool or
> > similar strategy.
>
> Aren't unmovable pages causing fragmentation? For movable pages,
> regardless of their orders, they can always be migrated if no additional
> pin is present.
Right, unmovable pages cause fragmentation, but movable pages can
also contribute to it. In terms of fragmentation itself, there is no
real difference between them—the only distinction is that movable
pages can be migrated or compacted.
My point is that compaction or direct reclaim may lead to both
allocation latency and increased power consumption, which may
defeat the purpose of using large folios to speed up Android phones.
Right now, order-0 allocations are spread everywhere, so compaction
and fragmentation may be playing a ping-pong game and remain
ineffective overall.
Ideally, we should try to reduce or avoid triggering this kind of migration,
compaction, and reclaim.
Our recent experiments using 16KB for both file and anon
show that we still frequently enter direct reclaim due to memory
fragmentation when allocating order-2 pages. In about 89% of
order-2 allocation cases, direct reclaim is triggered even though
there is sufficient free memory above the watermark, but no
contiguous order-2 pages are available.
These movable order-0 allocations, which are spread across every
pageblock, are contributing to the fragmentation.
>
> If we use per-order pageblocks, how to use pageblocks with rarely used
> orders? Allowing lower order to fallback to higher order pageblocks?
Yes. If order-0 finds that all pageblocks associated with order-0
have been exhausted, it can allocate a new pageblock for this purpose,
or in the worst case, fall back to a pageblock associated with a
higher order.
>
> >
> > Chris’s de-fragmentation of swap slots[1] seems to be a big success
> > based on my observations, where he provides a similar memory-order
> > preference for swap clusters. There is no reservation mechanism, no
> > sysfs knob, and no need to split swap into two areas—everything
> > just works automatically.
> >
> > I wonder if you would be interested in something similar at the
> > pageblock level. If so, I’d be happy to work on a prototype in
> > August. I’m completely booked in July.
> >
> > [1] https://lore.kernel.org/all/20240730-swap-allocator-v5-0-cb9c148b9297@xxxxxxxxxx/
> >
>
> I feel that swap and page allocation have a fundamental distintion,
> where swap slots are not movable, but pages can. Memory compaction can
> move pages around to make space for high order allocations, but does
> swap support something similar? How will page mobility work in this swap
> slot defragmentation world?
>
> In addition, when swap space is full, or only order-0 swap slots are
> available but higher order folios want to be swapped out, folio swap
> might simply stop (except splitting folios to fill the order-0 slots).
> But for page allocation, some pages can be reclaimed/swapped to make
> space and this adds complexity.
Yes, I agree it adds complexity, but it might significantly help
address the fragmentation issue from a different angle. We used to
focus on migratetypes, but now we could shift perspective and try to
contain order-0 allocations, which are the real source of
fragmentation.
BTW, I haven’t started anything on this yet, because I’m not sure
there is any positive feedback for it :-)
Best Regards
Barry