Re: [PATCH v2 1/9] mm/hugetlb: remove checking hstate_is_gigantic() in return_unused_surplus_pages()

From: HORIGUCHI NAOYA(堀口 直也)
Date: Mon Jun 27 2022 - 02:03:05 EST


On Fri, Jun 24, 2022 at 12:11:07PM -0700, Mike Kravetz wrote:
> On 06/24/22 08:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Fri, Jun 24, 2022 at 04:15:19PM +0800, Miaohe Lin wrote:
> > > On 2022/6/24 16:03, Muchun Song wrote:
> > > > On Fri, Jun 24, 2022 at 10:25:48AM +0800, Miaohe Lin wrote:
> > > >> On 2022/6/24 7:51, Naoya Horiguchi wrote:
> > > >>> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> > > >>>
> > > >>> I found a weird state of 1GB hugepage pool, caused by the following
> > > >>> procedure:
> > > >>>
> > > >>> - run a process reserving all free 1GB hugepages,
> > > >>> - shrink free 1GB hugepage pool to zero (i.e. writing 0 to
> > > >>> /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages), then
> > > >>> - kill the reserving process.
> > > >>>
> > > >>> , then all the hugepages are free *and* surplus at the same time.
> > > >>>
> > > >>> $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > > >>> 3
> > > >>> $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
> > > >>> 3
> > > >>> $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/resv_hugepages
> > > >>> 0
> > > >>> $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/surplus_hugepages
> > > >>> 3
> > > >>>
> > > >>> This state is resolved by reserving and allocating the pages then
> > > >>> freeing them again, so this seems not to result in serious problem.
> > > >>> But it's a little surprizing (shrinking pool suddenly fails).
> > > >>>
> > > >>> This behavior is caused by hstate_is_gigantic() check in
> > > >>> return_unused_surplus_pages(). This was introduced so long ago in 2008
> > > >>> by commit aa888a74977a ("hugetlb: support larger than MAX_ORDER"), and
> > > >>> it seems to me that this check is no longer unnecessary. Let's remove it.
>
> Thank you for finding this!!!
>
> > > >>> +++ b/mm/hugetlb.c
> > > >>> @@ -2432,10 +2432,6 @@ static void return_unused_surplus_pages(struct hstate *h,
> > > >>> /* Uncommit the reservation */
> > > >>> h->resv_huge_pages -= unused_resv_pages;
> > > >>>
> > > >>> - /* Cannot return gigantic pages currently */
> > > >>> - if (hstate_is_gigantic(h))
> > > >>> - goto out;
> > > >>> -
> > > >>
> > > >> IIUC it might be better to do the below check:
> > > >> /*
> > > >> * Cannot return gigantic pages currently if runtime gigantic page
> > > >> * allocation is not supported.
> > > >> */
> > > >> if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
> > > >> goto out;
> > > >>
> > > >
> > > > The change looks good to me. However, the comments above is unnecessary
> > > > since gigantic_page_runtime_supported() is straightforward.
> > >
> > > Agree. The comments can be removed.
> >
> > Thank you, both. Adding !gigantic_page_runtime_supported without comment
> > makes sense to me.
>
> The change above makes sense to me. However, ...
>
> If we make the change above, will we have the same strange situation described
> in the commit message when !gigantic_page_runtime_supported() is true?
>
> IIUC, !gigantic_page_runtime_supported implies that gigantic hugetlb
> pages can not be allocated or freed at run time. They can only be
> allocated at boot time. So, there should NEVER be surplus gigantic
> pages if !gigantic_page_runtime_supported().

I have the same understanding as the above.

> To avoid this situation,
> perhaps we should change set_max_huge_pages as follows (not tested)?

The suggested diff looks clearer about what it does, so I'd like to take it
in the next version. Then, what do we do on the "if (hstate_if_gigantic())"
check in return_unused_surplus_pages in the original suggestion? Should it
be kept as is, or removed, or checked with !gigantic_page_runtime_supported()?
I guess that the new checks prevent calling return_unused_surplus_pages()
during pool shrinking, so the check seems not necessary any more.

Thanks,
Naoya Horiguchi

>
> --
> Mike Kravetz
>
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5eabb8009d8a..c78d6c20e6b0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3416,7 +3416,8 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> * the user tries to allocate gigantic pages but let the user free the
> * boottime allocated gigantic pages.
> */
> - if (hstate_is_gigantic(h) && !IS_ENABLED(CONFIG_CONTIG_ALLOC)) {
> + if (hstate_is_gigantic(h) && (!IS_ENABLED(CONFIG_CONTIG_ALLOC) ||
> + !gigantic_page_runtime_supported())) {
> if (count > persistent_huge_pages(h)) {
> spin_unlock_irq(&hugetlb_lock);
> mutex_unlock(&h->resize_lock);
> @@ -3464,6 +3465,19 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
> goto out;
> }
>
> + /*
> + * We can not decrease gigantic pool size if runtime modification
> + * is not supported.
> + */
> + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) {
> + if (count < persistent_huge_pages(h)) {
> + spin_unlock_irq(&hugetlb_lock);
> + mutex_unlock(&h->resize_lock);
> + NODEMASK_FREE(node_alloc_noretry);
> + return -EINVAL;
> + }
> + }
> +
> /*
> * Decrease the pool size
> * First return free pages to the buddy allocator (being careful