Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates
From: Mike Kravetz
Date: Mon Jun 27 2022 - 14:42:43 EST
On 06/24/22 17:36, James Houghton wrote:
> When using HugeTLB high-granularity mapping, we need to go through the
> supported hugepage sizes in decreasing order so that we pick the largest
> size that works. Consider the case where we're faulting in a 1G hugepage
> for the first time: we want hugetlb_fault/hugetlb_no_page to map it with
> a PUD. By going through the sizes in decreasing order, we will find that
> PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too.
>
> Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a57e1be41401..5df838d86f32 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -33,6 +33,7 @@
> #include <linux/migrate.h>
> #include <linux/nospec.h>
> #include <linux/delayacct.h>
> +#include <linux/sort.h>
>
> #include <asm/page.h>
> #include <asm/pgalloc.h>
> @@ -48,6 +49,10 @@
>
> int hugetlb_max_hstate __read_mostly;
> unsigned int default_hstate_idx;
> +/*
> + * After hugetlb_init_hstates is called, hstates will be sorted from largest
> + * to smallest.
> + */
> struct hstate hstates[HUGE_MAX_HSTATE];
>
> #ifdef CONFIG_CMA
> @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
> kfree(node_alloc_noretry);
> }
>
> +static int compare_hstates_decreasing(const void *a, const void *b)
> +{
> + const int shift_a = huge_page_shift((const struct hstate *)a);
> + const int shift_b = huge_page_shift((const struct hstate *)b);
> +
> + if (shift_a < shift_b)
> + return 1;
> + if (shift_a > shift_b)
> + return -1;
> + return 0;
> +}
> +
> +static void sort_hstates(void)
> +{
> + unsigned long default_hstate_sz = huge_page_size(&default_hstate);
> +
> + /* Sort from largest to smallest. */
> + sort(hstates, hugetlb_max_hstate, sizeof(*hstates),
> + compare_hstates_decreasing, NULL);
> +
> + /*
> + * We may have changed the location of the default hstate, so we need to
> + * update it.
> + */
> + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz));
> +}
> +
> static void __init hugetlb_init_hstates(void)
> {
> struct hstate *h, *h2;
>
> - for_each_hstate(h) {
> - if (minimum_order > huge_page_order(h))
> - minimum_order = huge_page_order(h);
> + sort_hstates();
>
> + /* The last hstate is now the smallest. */
> + minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]);
> +
> + for_each_hstate(h) {
> /* oversize hugepages were init'ed in early boot */
> if (!hstate_is_gigantic(h))
> hugetlb_hstate_alloc_pages(h);
This may/will cause problems for gigantic hugetlb pages allocated at boot
time. See alloc_bootmem_huge_page() where a pointer to the associated hstate
is encoded within the allocated hugetlb page. These pages are added to
hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved
hstate to add prep the gigantic page and add to the correct pool. Currently,
gather_bootmem_prealloc is called after hugetlb_init_hstates. So, changing
hstate order will cause errors.
I do not see any reason why we could not call gather_bootmem_prealloc before
hugetlb_init_hstates to avoid this issue.
--
Mike Kravetz