Re: [External] Re: [PATCH v2 07/19] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page

From: Muchun Song
Date: Thu Oct 29 2020 - 03:41:39 EST


On Thu, Oct 29, 2020 at 7:42 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> On 10/26/20 7:51 AM, Muchun Song wrote:
> > When we allocate a hugetlb page from the buddy, we should free the
> > unused vmemmap pages associated with it. We can do that in the
> > prep_new_huge_page().
> >
> > Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
> > ---
> > arch/x86/include/asm/hugetlb.h | 7 +
> > arch/x86/include/asm/pgtable_64_types.h | 8 +
> > include/linux/hugetlb.h | 7 +
> > mm/hugetlb.c | 190 ++++++++++++++++++++++++
> > 4 files changed, 212 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> > index f5e882f999cd..7c3eb60c2198 100644
> > --- a/arch/x86/include/asm/hugetlb.h
> > +++ b/arch/x86/include/asm/hugetlb.h
> > @@ -4,10 +4,17 @@
> >
> > #include <asm/page.h>
> > #include <asm-generic/hugetlb.h>
> > +#include <asm/pgtable.h>
> >
> > #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > #define VMEMMAP_HPAGE_SHIFT PMD_SHIFT
> > #define arch_vmemmap_support_huge_mapping() boot_cpu_has(X86_FEATURE_PSE)
> > +
> > +#define vmemmap_pmd_huge vmemmap_pmd_huge
> > +static inline bool vmemmap_pmd_huge(pmd_t *pmd)
> > +{
> > + return pmd_large(*pmd);
> > +}
> > #endif
> >
> > #define hugepages_supported() boot_cpu_has(X86_FEATURE_PSE)
> > diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> > index 52e5f5f2240d..bedbd2e7d06c 100644
> > --- a/arch/x86/include/asm/pgtable_64_types.h
> > +++ b/arch/x86/include/asm/pgtable_64_types.h
> > @@ -139,6 +139,14 @@ extern unsigned int ptrs_per_p4d;
> > # define VMEMMAP_START __VMEMMAP_BASE_L4
> > #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
> >
> > +/*
> > + * VMEMMAP_SIZE - allows the whole linear region to be covered by
> > + * a struct page array.
> > + */
> > +#define VMEMMAP_SIZE (1UL << (__VIRTUAL_MASK_SHIFT - PAGE_SHIFT - \
> > + 1 + ilog2(sizeof(struct page))))
> > +#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE)
> > +
> > #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
> >
> > #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index ace304a6196c..919f47d77117 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -601,6 +601,13 @@ static inline bool arch_vmemmap_support_huge_mapping(void)
> > }
> > #endif
> >
> > +#ifndef vmemmap_pmd_huge
>
> Let's add
> #define vmemmap_pmd_huge vmemmap_pmd_huge
> just in case code gets moved around in header file.

OK, will do.

>
> > +static inline bool vmemmap_pmd_huge(pmd_t *pmd)
> > +{
> > + return pmd_huge(*pmd);
> > +}
> > +#endif
> > +
> > #ifndef VMEMMAP_HPAGE_SHIFT
> > #define VMEMMAP_HPAGE_SHIFT PMD_SHIFT
> > #endif
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index d6ae9b6876be..aa012d603e06 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1293,10 +1293,20 @@ static inline void destroy_compound_gigantic_page(struct page *page,
> > #endif
> >
> > #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> > +#include <linux/bootmem_info.h>
> > +
> > #define RESERVE_VMEMMAP_NR 2U
> > +#define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
>
> Since RESERVE_VMEMMAP_SIZE is not used here, perhaps it should be added
> in the patch where it is first used.

Will do.

>
> >
> > #define page_huge_pte(page) ((page)->pmd_huge_pte)
> >
> > +#define vmemmap_hpage_addr_end(addr, end) \
> > +({ \
> > + unsigned long __boundary; \
> > + __boundary = ((addr) + VMEMMAP_HPAGE_SIZE) & VMEMMAP_HPAGE_MASK;\
> > + (__boundary - 1 < (end) - 1) ? __boundary : (end); \
> > +})
> > +
> > static inline unsigned int nr_free_vmemmap(struct hstate *h)
> > {
> > return h->nr_free_vmemmap_pages;
> > @@ -1416,6 +1426,181 @@ static void __init hugetlb_vmemmap_init(struct hstate *h)
> > pr_info("HugeTLB: can free %d vmemmap pages for %s\n",
> > h->nr_free_vmemmap_pages, h->name);
> > }
> > +
> > +static inline spinlock_t *vmemmap_pmd_lockptr(pmd_t *pmd)
> > +{
> > + static DEFINE_SPINLOCK(pgtable_lock);
> > +
> > + return &pgtable_lock;
> > +}
>
> This is just a global lock. Correct? And hugetlb specific?

Yes, it is a global lock. Originally, I wanted to use the pmd lock(e.g.
pmd_lockptr()). But we need to allocate memory for the spinlock and
initialize it when ALLOC_SPLIT_PTLOCKS. It may increase the
complexity.

And I think that here alloc/free hugetlb pages is not a frequent operation.
So I finally use a global lock. Maybe it is enough.

>
> It should be OK as the page table entries for huegtlb pages will not
> overlap with other entries.

Does "hugetlb specific" mean the pmd lock? or per hugetlb lock?
If it is pmd lock, this is fine to me. If not, it may not be enough.
Because the lock also guards the splitting of pmd pgtable.

Thanks.
>
> > +
> > +/*
> > + * Walk a vmemmap address to the pmd it maps.
> > + */
> > +static pmd_t *vmemmap_to_pmd(const void *page)
> > +{
> > + unsigned long addr = (unsigned long)page;
> > + pgd_t *pgd;
> > + p4d_t *p4d;
> > + pud_t *pud;
> > + pmd_t *pmd;
> > +
> > + if (addr < VMEMMAP_START || addr >= VMEMMAP_END)
> > + return NULL;
> > +
> > + pgd = pgd_offset_k(addr);
> > + if (pgd_none(*pgd))
> > + return NULL;
> > + p4d = p4d_offset(pgd, addr);
> > + if (p4d_none(*p4d))
> > + return NULL;
> > + pud = pud_offset(p4d, addr);
> > +
> > + WARN_ON_ONCE(pud_bad(*pud));
> > + if (pud_none(*pud) || pud_bad(*pud))
> > + return NULL;
> > + pmd = pmd_offset(pud, addr);
> > +
> > + return pmd;
> > +}
>
> That routine is not really hugetlb specific. Perhaps we could move it
> to sparse-vmemmap.c? Or elsewhere?

Yeah, we can move it to sparse-vmemmap.c, maybe better.

>
> --
> Mike Kravetz



--
Yours,
Muchun