Re: [PATCH 2/5] mm: hugetlb: introduce helpers to preallocate page tables from bootmem allocator

From: Mike Kravetz
Date: Thu Jun 10 2021 - 18:13:45 EST


On 6/9/21 5:13 AM, Muchun Song wrote:
> If we want to split the huge PMD of vmemmap pages associated with each
> gigantic page allocated from bootmem allocator, we should pre-allocate
> the page tables from bootmem allocator.

Just curious why this is necessary and a good idea? Why not wait until
the gigantic pages allocated from bootmem are added to the pool to
allocate any necessary vmemmmap pages?

> the page tables from bootmem allocator. In this patch, we introduce
> some helpers to preallocate page tables for gigantic pages.
>
> Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
> ---
> include/linux/hugetlb.h | 3 +++
> mm/hugetlb_vmemmap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
> mm/hugetlb_vmemmap.h | 13 ++++++++++
> 3 files changed, 79 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 03ca83db0a3e..c27a299c4211 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -622,6 +622,9 @@ struct hstate {
> struct huge_bootmem_page {
> struct list_head list;
> struct hstate *hstate;
> +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> + pte_t *vmemmap_pte;
> +#endif
> };
>
> int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 628e2752714f..6f3a47b4ebd3 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -171,6 +171,7 @@
> #define pr_fmt(fmt) "HugeTLB: " fmt
>
> #include <linux/list.h>
> +#include <linux/memblock.h>
> #include <asm/pgalloc.h>
>
> #include "hugetlb_vmemmap.h"
> @@ -263,6 +264,68 @@ int vmemmap_pgtable_prealloc(struct hstate *h, struct list_head *pgtables)
> return -ENOMEM;
> }
>
> +unsigned long __init gigantic_vmemmap_pgtable_prealloc(void)
> +{
> + struct huge_bootmem_page *m, *tmp;
> + unsigned long nr_free = 0;
> +
> + list_for_each_entry_safe(m, tmp, &huge_boot_pages, list) {
> + struct hstate *h = m->hstate;
> + unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
> + unsigned long size;
> +
> + if (!nr)
> + continue;
> +
> + size = nr << PAGE_SHIFT;
> + m->vmemmap_pte = memblock_alloc_try_nid(size, PAGE_SIZE, 0,
> + MEMBLOCK_ALLOC_ACCESSIBLE,
> + NUMA_NO_NODE);
> + if (!m->vmemmap_pte) {
> + nr_free++;
> + list_del(&m->list);
> + memblock_free_early(__pa(m), huge_page_size(h));

If we can not allocate the vmmmemap pages to split the PMD, then we will
not add the huge page to the pool. Correct?

Perhaps I am thinking about this incorrectly, but this seems wrong. We
already have everything we need to add the page to the pool. vmemmap
reduction is an optimization. So, the allocation failure is associated
with an optimization. In this case, it seems like we should just skip
the optimization (vmemmap reduction) and proceed to add the page to the
pool? It seems we do the same thing in subsequent patches.

Again, I could be thinking about this incorrectly.
--
Mike Kravetz