Re: [PATCH v10 11/12] mm/vmalloc: Hugepage vmalloc mappings

From: Christoph Hellwig
Date: Sun Jan 24 2021 - 10:09:12 EST


On Sun, Jan 24, 2021 at 06:22:29PM +1000, Nicholas Piggin wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..f87feb616184 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,16 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> config HAVE_ARCH_HUGE_VMAP
> bool
>
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> + help
> + Archs that select this would be capable of PMD-sized vmaps (i.e.,
> + arch_vmap_pmd_supported() returns true), and they must make no
> + assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The
> + VM_NOHUGE flag can be used to prohibit arch-specific allocations from
> + using hugepages to help with this (e.g., modules may require it).

help texts don't make sense for options that aren't user visible.

More importantly, is there any good reason to keep the option and not
just go the extra step and enable huge page vmalloc for arm64 and x86
as well?

> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> + * This may not 100% tell if the area is mapped with > PAGE_SIZE
> + * page table entries, if for some reason the architecture indicates
> + * larger sizes are available but decides not to use them, nothing
> + * prevents that. This only indicates the size of the physical page
> + * allocated in the vmalloc layer.
> + */
> + return (find_vm_area(addr)->page_order > 0);

No need for the braces here.

> }
>
> +static int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages, unsigned int page_shift)
> +{
> + unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
> +
> + WARN_ON(page_shift < PAGE_SHIFT);
> +
> + if (page_shift == PAGE_SHIFT)
> + return vmap_small_pages_range_noflush(addr, end, prot, pages);

This begs for a IS_ENABLED check to disable the hugepage code for
architectures that don't need it.

> +int map_kernel_range_noflush(unsigned long addr, unsigned long size,
> + pgprot_t prot, struct page **pages)
> +{
> + return vmap_pages_range_noflush(addr, addr + size, prot, pages, PAGE_SHIFT);
> +}

Please just kill off map_kernel_range_noflush and map_kernel_range
off entirely in favor of the vmap versions.

> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {

Maybe using a helper that takes the vm_area_struct and either returns
area->page_order or always 0 based on IS_ENABLED?