Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings

From: Jonathan Cameron
Date: Wed Aug 12 2020 - 08:27:01 EST


On Mon, 10 Aug 2020 12:27:32 +1000
Nicholas Piggin <npiggin@xxxxxxxxx> wrote:

> On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> vmalloc will attempt to allocate PMD-sized pages first, before falling
> back to small pages.
>
> Allocations which use something other than PAGE_KERNEL protections are
> not permitted to use huge pages yet, not all callers expect this (e.g.,
> module allocations vs strict module rwx).
>
> This reduces TLB misses by nearly 30x on a `git diff` workload on a
> 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmap is added to disable at boot.
>
> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
Hi Nicholas,

Busy afternoon, but a possible point of interest in line in the meantime.


...

> @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> pgprot_t prot, unsigned long vm_flags, int node,
> const void *caller)
> {
> - struct vm_struct *area;
> + struct vm_struct *area = NULL;
> void *addr;
> unsigned long real_size = size;
> + unsigned long real_align = align;
> + unsigned int shift = PAGE_SHIFT;
>
> size = PAGE_ALIGN(size);
> if (!size || (size >> PAGE_SHIFT) > totalram_pages())
> goto fail;
>
> - area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> + if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> + unsigned long size_per_node;
> +
> + /*
> + * Try huge pages. Only try for PAGE_KERNEL allocations,
> + * others like modules don't yet expect huge pages in
> + * their allocations due to apply_to_page_range not
> + * supporting them.
> + */
> +
> + size_per_node = size;
> + if (node == NUMA_NO_NODE)
> + size_per_node /= num_online_nodes();
> + if (size_per_node >= PMD_SIZE)
> + shift = PMD_SHIFT;
> + }
> +
> +again:
> + align = max(real_align, 1UL << shift);
> + size = ALIGN(real_size, align);

So my suspicion is that the issue on arm64 is related to this.
In the relevant call path, align is 32K whilst the size is 16K

Previously I don't think we force size to be a multiple of align.

I think this results in nr_pages being double what it was before.


> +
> + area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
> vm_flags, start, end, node, gfp_mask, caller);
> if (!area)
> goto fail;
>
> - addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> + addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
> if (!addr)
> - return NULL;
> + goto fail;
>
> /*
> * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> return addr;
>
> fail:
> - warn_alloc(gfp_mask, NULL,
> + if (shift > PAGE_SHIFT) {
> + shift = PAGE_SHIFT;
> + goto again;
> + }
> +
> + if (!area) {
> + /* Warn for area allocation, page allocations already warn */
> + warn_alloc(gfp_mask, NULL,
> "vmalloc: allocation failure: %lu bytes", real_size);
> + }
> return NULL;
> }
>