Re: [PATCH v4 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory

From: Andrew Morton

Date: Wed Jun 24 2026 - 22:57:13 EST

On Thu, 18 Jun 2026 16:47:20 +0800 Wen Jiang <jiangwenxiaomi@xxxxxxxxx> wrote:

> This patchset accelerates ioremap, vmalloc, and vmap when the memory
> is physically fully or partially contiguous. Two techniques are used:

Thanks.

> 1. Avoid page table rewalk when setting PTEs/PMDs for multiple memory
> segments
> 2. Use batched mappings wherever possible in both vmalloc and ARM64
> layers
>
> Besides accelerating the mapping path, this also enables large
> mappings (PMD and cont-PTE) for vmap, which are currently not
> supported.
>
> Patches 1-2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
> CONT-PTE regions instead of just one.
>
> Patch 3 extracts a common helper vmap_set_ptes() that consolidates PTE
> mapping logic between the ioremap and vmalloc/vmap paths, handling both
> CONT_PTE and regular PTE mappings. This prepares for the next patch.
>
> Patch 4 extends the page table walk path to support page shifts other
> than PAGE_SHIFT and eliminates the page table rewalk for huge vmalloc
> mappings. The function is renamed from vmap_small_pages_range_noflush()
> to vmap_pages_range_noflush_walk().
>
> Patches 5-6 add huge vmap support for contiguous pages, including
> support for non-compound pages with pfn alignment verification.
>
> On the RK3588 8-core ARM64 SoC, with tasks pinned to a little core and
> the performance CPUfreq policy enabled, benchmark results:
>
> * ioremap(1 MB): 1.35x faster (3407 ns -> 2526 ns)
> * vmalloc(1 MB) mapping time (excluding allocation) with
> VM_ALLOW_HUGE_VMAP: 1.42x faster (5.00 us -> 3.53us)
> * vmap(100MB) with order-8 pages: 8.3x faster (1235 us -> 149 us)

Nice.

> Many thanks to Xueyuan Chen for his testing efforts on RK3588 boards.

Indeed.

I see Dev had a good look at v3 - hopefully he (and Ulad) (and more ARM
folks) have time to go through this.

Is there any effect on anything other than arm64? I'm wondering how
much testing these changes will really get in mm.git and linux-next.

How is our selftests coverage of these changes? Is there some existing
selftest which will exercise these new features?

You diligently went through the Sashiko report against v3 (thanks).
Please pass an eye across its v4 report, see if something new popped
up?
https://sashiko.dev/#/patchset/20260618084726.1070022-1-jiangwen6@xxxxxxxxxx