Re: [PATCH] hugetlb: prioritize surplus allocation from current node

From: Aristeu Rozanski
Date: Wed Dec 04 2024 - 17:29:55 EST


On Thu, Dec 05, 2024 at 01:55:03AM +0900, Koichiro Den wrote:
> Previously, surplus allocations triggered by mmap were typically made
> from the node where the process was running. On a page fault, the area
> was reliably dequeued from the hugepage_freelists for that node.
> However, since commit 003af997c8a9 ("hugetlb: force allocating surplus
> hugepages on mempolicy allowed nodes"), dequeue_hugetlb_folio_vma() may
> fall back to other nodes unnecessarily even if there is no MPOL_BIND
> policy, causing folios to be dequeued from nodes other than the current
> one.
>
> Also, allocating from the node where the current process is running is
> likely to result in a performance win, as mmap-ing processes often
> touch the area not so long after allocation. This change minimizes
> surprises for users relying on the previous behavior while maintaining
> the benefit introduced by the commit.
>
> So, prioritize the node the current process is running on when possible.
>
> Signed-off-by: Koichiro Den <koichiro.den@xxxxxxxxxxxxx>
> ---
> mm/hugetlb.c | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5c8de0f5c760..0fa24e105202 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2463,7 +2463,13 @@ static int gather_surplus_pages(struct hstate *h, long delta)
> long needed, allocated;
> bool alloc_ok = true;
> int node;
> - nodemask_t *mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
> + nodemask_t *mbind_nodemask, alloc_nodemask;
> +
> + mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
> + if (mbind_nodemask)
> + nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed);
> + else
> + alloc_nodemask = cpuset_current_mems_allowed;
>
> lockdep_assert_held(&hugetlb_lock);
> needed = (h->resv_huge_pages + delta) - h->free_huge_pages;
> @@ -2479,8 +2485,16 @@ static int gather_surplus_pages(struct hstate *h, long delta)
> spin_unlock_irq(&hugetlb_lock);
> for (i = 0; i < needed; i++) {
> folio = NULL;
> - for_each_node_mask(node, cpuset_current_mems_allowed) {
> - if (!mbind_nodemask || node_isset(node, *mbind_nodemask)) {
> +
> + /* Prioritize current node */
> + if (node_isset(numa_mem_id(), alloc_nodemask))
> + folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
> + numa_mem_id(), NULL);
> +
> + if (!folio) {
> + for_each_node_mask(node, alloc_nodemask) {
> + if (node == numa_mem_id())
> + continue;
> folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
> node, NULL);
> if (folio)

Acked-by: Aristeu Rozanski <aris@xxxxxxxxx>

--
Aristeu