Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings

From: Jonathan Cameron
Date: Wed Aug 12 2020 - 12:20:06 EST


On Wed, 12 Aug 2020 13:25:24 +0100
Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote:

> On Mon, 10 Aug 2020 12:27:32 +1000
> Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>
> > On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> > vmalloc will attempt to allocate PMD-sized pages first, before falling
> > back to small pages.
> >
> > Allocations which use something other than PAGE_KERNEL protections are
> > not permitted to use huge pages yet, not all callers expect this (e.g.,
> > module allocations vs strict module rwx).
> >
> > This reduces TLB misses by nearly 30x on a `git diff` workload on a
> > 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> >
> > This can result in more internal fragmentation and memory overhead for a
> > given allocation, an option nohugevmap is added to disable at boot.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
> Hi Nicholas,
>
> Busy afternoon, but a possible point of interest in line in the meantime.
>

I did manage to get back to this.

The issue I think is that ARM64 defines THREAD_ALIGN with CONFIG_VMAP_STACK
to be 2* THREAD SIZE. There is comment in arch/arm64/include/asm/memory.h
that this is to allow cheap checking of overflow.

A quick grep suggests ARM64 is the only architecture to do this...

Jonathan



>
> ...
>
> > @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > pgprot_t prot, unsigned long vm_flags, int node,
> > const void *caller)
> > {
> > - struct vm_struct *area;
> > + struct vm_struct *area = NULL;
> > void *addr;
> > unsigned long real_size = size;
> > + unsigned long real_align = align;
> > + unsigned int shift = PAGE_SHIFT;
> >
> > size = PAGE_ALIGN(size);
> > if (!size || (size >> PAGE_SHIFT) > totalram_pages())
> > goto fail;
> >
> > - area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> > + if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> > + unsigned long size_per_node;
> > +
> > + /*
> > + * Try huge pages. Only try for PAGE_KERNEL allocations,
> > + * others like modules don't yet expect huge pages in
> > + * their allocations due to apply_to_page_range not
> > + * supporting them.
> > + */
> > +
> > + size_per_node = size;
> > + if (node == NUMA_NO_NODE)
> > + size_per_node /= num_online_nodes();
> > + if (size_per_node >= PMD_SIZE)
> > + shift = PMD_SHIFT;
> > + }
> > +
> > +again:
> > + align = max(real_align, 1UL << shift);
> > + size = ALIGN(real_size, align);
>
> So my suspicion is that the issue on arm64 is related to this.
> In the relevant call path, align is 32K whilst the size is 16K
>
> Previously I don't think we force size to be a multiple of align.
>
> I think this results in nr_pages being double what it was before.
>
>
> > +
> > + area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
> > vm_flags, start, end, node, gfp_mask, caller);
> > if (!area)
> > goto fail;
> >
> > - addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> > + addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
> > if (!addr)
> > - return NULL;
> > + goto fail;
> >
> > /*
> > * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> > @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > return addr;
> >
> > fail:
> > - warn_alloc(gfp_mask, NULL,
> > + if (shift > PAGE_SHIFT) {
> > + shift = PAGE_SHIFT;
> > + goto again;
> > + }
> > +
> > + if (!area) {
> > + /* Warn for area allocation, page allocations already warn */
> > + warn_alloc(gfp_mask, NULL,
> > "vmalloc: allocation failure: %lu bytes", real_size);
> > + }
> > return NULL;
> > }
> >
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel