Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page
From: Muchun Song
Date: Mon Dec 21 2020 - 06:26:51 EST
On Mon, Dec 21, 2020 at 5:11 PM Oscar Salvador <osalvador@xxxxxxx> wrote:
>
> On Thu, Dec 17, 2020 at 08:12:55PM +0800, Muchun Song wrote:
> > +static inline void free_bootmem_page(struct page *page)
> > +{
> > + unsigned long magic = (unsigned long)page->freelist;
> > +
> > + /*
> > + * The reserve_bootmem_region sets the reserved flag on bootmem
> > + * pages.
> > + */
> > + VM_WARN_ON(page_ref_count(page) != 2);
> > +
> > + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> > + put_page_bootmem(page);
> > + else
> > + VM_WARN_ON(1);
>
> Ideally, I think we want to see what how the page looks since its state
> is not what we expected, so maybe join both conditions and use dump_page().
Agree. Will do. Thanks.
>
> > + * By removing redundant page structs for HugeTLB pages, memory can returned to
> ^^ be
Thanks.
> > + * the buddy allocator for other uses.
>
> [...]
>
> > +void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> > +{
> > + unsigned long vmemmap_addr = (unsigned long)head;
> > +
> > + if (!free_vmemmap_pages_per_hpage(h))
> > + return;
> > +
> > + vmemmap_remap_free(vmemmap_addr + RESERVE_VMEMMAP_SIZE,
> > + free_vmemmap_pages_size_per_hpage(h));
>
> I am not sure what others think, but I would like to see vmemmap_remap_free taking
> three arguments: start, end, and reuse addr, e.g:
>
> void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> {
> unsigned long vmemmap_addr = (unsigned long)head;
> unsigned long vmemmap_end, vmemmap_reuse;
>
> if (!free_vmemmap_pages_per_hpage(h))
> return;
>
> vmemmap_addr += RESERVE_MEMMAP_SIZE;
> vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
> vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>
> vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
> }
>
> The reason for me to do this is to let the callers of vmemmap_remap_free decide
> __what__ they want to remap.
>
> More on this below.
>
>
> > +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> > + unsigned long end,
> > + struct vmemmap_remap_walk *walk)
> > +{
> > + pte_t *pte;
> > +
> > + pte = pte_offset_kernel(pmd, addr);
> > +
> > + if (walk->reuse_addr == addr) {
> > + BUG_ON(pte_none(*pte));
> > + walk->reuse_page = pte_page(*pte++);
> > + addr += PAGE_SIZE;
> > + }
>
> Although it is quite obvious, a brief comment here pointing out what are we
> doing and that this is meant to be set only once would be nice.
OK. Will do.
>
>
> > +static void vmemmap_remap_range(unsigned long start, unsigned long end,
> > + struct vmemmap_remap_walk *walk)
> > +{
> > + unsigned long addr = start - PAGE_SIZE;
> > + unsigned long next;
> > + pgd_t *pgd;
> > +
> > + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> > + VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> > +
> > + walk->reuse_page = NULL;
> > + walk->reuse_addr = addr;
>
> With the change I suggested above, struct vmemmap_remap_walk should be
> initialitzed at once in vmemmap_remap_free, so this should not longer be needed.
You are right.
> (And btw, you do not need to set reuse_page to NULL, the way you init the struct
> in vmemmap_remap_free makes sure to null any field you do not explicitly set).
>
>
> > +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> > + struct vmemmap_remap_walk *walk)
> > +{
> > + /*
> > + * Make the tail pages are mapped with read-only to catch
> > + * illegal write operation to the tail pages.
> "Remap the tail pages as read-only to ..."
Thanks.
>
> > + */
> > + pgprot_t pgprot = PAGE_KERNEL_RO;
> > + pte_t entry = mk_pte(walk->reuse_page, pgprot);
> > + struct page *page;
> > +
> > + page = pte_page(*pte);
>
> struct page *page = pte_page(*pte);
>
> since you did the same for the other two.
Yeah. Will change to this.
>
> > + list_add(&page->lru, walk->vmemmap_pages);
> > +
> > + set_pte_at(&init_mm, addr, pte, entry);
> > +}
> > +
> > +/**
> > + * vmemmap_remap_free - remap the vmemmap virtual address range
> > + * [start, start + size) to the page which
> > + * [start - PAGE_SIZE, start) is mapped,
> > + * then free vmemmap pages.
> > + * @start: start address of the vmemmap virtual address range
> > + * @size: size of the vmemmap virtual address range
> > + */
> > +void vmemmap_remap_free(unsigned long start, unsigned long size)
> > +{
> > + unsigned long end = start + size;
> > + LIST_HEAD(vmemmap_pages);
> > +
> > + struct vmemmap_remap_walk walk = {
> > + .remap_pte = vmemmap_remap_pte,
> > + .vmemmap_pages = &vmemmap_pages,
> > + };
>
> As stated above, this would become:
>
> void vmemmap_remap_free(unsigned long start, unsigned long end,
> usigned long reuse)
> {
> LIST_HEAD(vmemmap_pages);
> struct vmemmap_remap_walk walk = {
> .reuse_addr = reuse,
> .remap_pte = vmemmap_remap_pte,
> .vmemmap_pages = &vmemmap_pages,
> };
>
> You might have had your reasons to do this way, but this looks more natural
> to me, with the plus that callers of vmemmap_remap_free can specify
> what they want to remap.
Should we add a BUG_ON in vmemmap_remap_free() for now?
BUG_ON(reuse != start + PAGE_SIZE);
>
>
> --
> Oscar Salvador
> SUSE L3
--
Yours,
Muchun