Re: [PATCH] Respect mempolicy when calculating surplus huge pages.

From: Joshua Hahn

Date: Tue Jun 09 2026 - 00:24:03 EST

On Mon, 8 Jun 2026 10:21:22 -0600 Charles Haithcock <chaithco@xxxxxxxxxx> wrote:

> On Tue, Jun 2, 2026, 9:20 AM Joshua Hahn <joshua.hahnjy@xxxxxxxxx> wrote:
>
> > Hello Charles, thank you for the patch!
>
> You are so welcome, and thank you! I'm newer to contributing to upstream so
> I welcome any and all feedback!

Hello Charles, welcome!! : -)

> > I just wanted to add that it seems like this is a known issue. From the
> > comment in hugetlb_acct_memory (the only caller of gather_surplus_pages)
> > we have the following comment block:
> > [...]
> > So it would appear that getting an exact number of pages to allocate,
> > and ensure that there are no changes with the reservation or which nodes
> > those reservations actually go to is a lot more difficult. But I think
> > we can do a bit better.
>
> Agreed; actually, after submitting the initial version, I went back and
> have been working on adding per-node reserve hugepage counts to the hstates
> and adding some selftests for them. It's going well so far, but I'm
> hitting an issue in accounting where, after reserving a static hugetlb page
> on node0, reserving and then freeing a surplus hugetlb page on node1, and
> then freeing the static hugetlb page on node0, followup reservations on
> node0 additionally count towards surplus pages even when there are free
> static hugetlb pages on that node. Trace-cmd revealed allowed_mems_nr_free
> is returning more than what was requested resulting in the over-allocated
> surplus pages to stick around rather than be freed back.

I see. So from the perspective of the system, even though you free the
static hugetlb page, the system views that as unavailable to other
reservations and so future allocations go towards surplus pages?
How exactly are you freeing the page in this case? And how long do you
wait between freeing the static page and allocating the next page?

I wonder if you are seeing this issue without your per-node
hugetlb page situation as well. I have seen quite a few hugetlb folio
reservation restoration accounting bugs recently, including some that
I think are still in mm-new. If you're developing against anything
other than mm-new, I'd suggest trying to build on top of that to see
if your bug still persists.

> I plan to dig into
> what's causing that, however, if doing so leads down a nasty path of adding
> per-node accounting, I may need to fall back to the first version of this
> patch (but with selftests) and be ok with simply doing "a bit better" like
> you mention.

Sounds good! I wonder if it would be a good exercise to measure how
much we typically over-allocate by when using the simplified calculation
I noted in my previously reply.

> > FWIW, I think over-allocating is actually not fatal (although
> > overallocating
> > by a lot is obviously not desirable) since we free all the unused hugetlb
> > pages at the end of gather_surplus_pages.
>
> Agreed on this point as well; afaik, surplus hugetlb pages are viewed quite
> differently from static hugetlb pages where surplus hugetlb pages are "best
> effort." Obviously we should try to reserve the requested amount, true
> users of large quantities of hugetlb pages should tend towards setting them
> on boot rather than dynamically.

Yup, totally agree here.

> > [...] I wonder if an approach like this
> > could work:
> > [...]
> > So we compare the mempolicy-perspective "needed" and compare it to the
> > global "needed" and take whatever. Since we are taking a max it should
> > only ever make it more likely to actually succeed with the mempolicy-bound
> > hugetlb page usage, even though we still can't make guarantees since
> > a free page on our node may be taken by a different reservation later.
>
> This is considerably simpler than a per-node reserved hugetlb page count!
> If I end up falling back to my initial patch, I will absolutely be adding
> this and attributing you. Thank you!

Sounds good with me : -) Good luck and have a great day!
Joshua