Re: [PATCH v4 0/6] mm/vmalloc: Speed up ioremap, vmalloc and vmap with contiguous memory

From: Barry Song

Date: Fri Jun 26 2026 - 07:10:24 EST

On Thu, Jun 25, 2026 at 2:37 PM Dev Jain <dev.jain@xxxxxxx> wrote:
>
>
>
> On 18/06/26 2:17 pm, Wen Jiang wrote:
> > This patchset accelerates ioremap, vmalloc, and vmap when the memory
> > is physically fully or partially contiguous. Two techniques are used:
> >
> > 1. Avoid page table rewalk when setting PTEs/PMDs for multiple memory
> > segments
> > 2. Use batched mappings wherever possible in both vmalloc and ARM64
> > layers
> >
> > Besides accelerating the mapping path, this also enables large
> > mappings (PMD and cont-PTE) for vmap, which are currently not
> > supported.
> >
> > Patches 1-2 extend ARM64 vmalloc CONT-PTE mapping to support multiple
> > CONT-PTE regions instead of just one.
> >
> > Patch 3 extracts a common helper vmap_set_ptes() that consolidates PTE
> > mapping logic between the ioremap and vmalloc/vmap paths, handling both
> > CONT_PTE and regular PTE mappings. This prepares for the next patch.
> >
> > Patch 4 extends the page table walk path to support page shifts other
> > than PAGE_SHIFT and eliminates the page table rewalk for huge vmalloc
> > mappings. The function is renamed from vmap_small_pages_range_noflush()
> > to vmap_pages_range_noflush_walk().
> >
> > Patches 5-6 add huge vmap support for contiguous pages, including
> > support for non-compound pages with pfn alignment verification.
> >
> > On the RK3588 8-core ARM64 SoC, with tasks pinned to a little core and
> > the performance CPUfreq policy enabled, benchmark results:
> >
> > * ioremap(1 MB): 1.35x faster (3407 ns -> 2526 ns)
> > * vmalloc(1 MB) mapping time (excluding allocation) with
> > VM_ALLOW_HUGE_VMAP: 1.42x faster (5.00 us -> 3.53us)
> > * vmap(100MB) with order-8 pages: 8.3x faster (1235 us -> 149 us)
> >
> > Many thanks to Xueyuan Chen for his testing efforts on RK3588 boards.
> >
>
> I am still a little nervous about doing vmap-huge by default.
>
> We can play set_memory_* games on a vmap huge mapping partially, thus
> forcing a pgtable split, and not all arches can handle a kernel pgtable
> split.
>
> For arm64, we can handle that with BBML2_NOABORT, but interestingly, in
> change_memory_common, arch/arm64/mm/pageattr.c:
>
> area = find_vm_area((void *)addr);
> if (!area ||
> ((unsigned long)kasan_reset_tag((void *)end) >
> (unsigned long)kasan_reset_tag(area->addr) + area->size) ||
> ((area->flags & (VM_ALLOC | VM_ALLOW_HUGE_VMAP)) != VM_ALLOC))
> return -EINVAL;
>
> Even before my change fcf8dda8cc48, we were bailing out on
>
> !(area->flags & VM_ALLOC))
>
> So on arm64 we haven't been supporting set_memory_* for vmap memory at all, because
> it has VM_MAP set and not VM_ALLOC. Although we have a contradictory comment above
> this code so not sure if this was intentional:
>
> "Let's restrict ourselves to mappings created by vmalloc (or vmap)."
>
>
> So either there is no user in the kernel doing vmap + set_memory_* (looks like it
> by doing an LLM scan), or it is not fatal for set_memory_* to fail.

Hi Dev,

The primary purpose of vmap() is to provide the CPU with a
virtual mapping to access memory used by device drivers,
followed by the appropriate cache synchronization with the
device when necessary.

Given that, I think it's technically quite questionable to use
set_memory_xxx() to change the page table attributes of a vmap
area, especially for only part of an existing vmap mapping.

>
> But even if no one does it now, technically the API allows it.

In case we ever run into the rather subtle case where someone
calls set_memory_xxx() on a vmap() mapping, are you
suggesting that VM_ALLOW_HUGE_VMAP should also apply to
vmap(), rather than only vmalloc()?
something like the concept below?
index 14e5a6f6cc76..204770474c60 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3559,13 +3559,15 @@ static inline unsigned int vm_shift(pgprot_t
prot, unsigned long size)
}

static inline int get_vmap_batch_order(struct page **pages,
- pgprot_t prot, unsigned int max_steps, unsigned int idx)
+ unsigned long flags, pgprot_t prot, unsigned int
max_steps, unsigned int idx)
{
unsigned int nr_contig;
int order;

if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
return 0;
+ if (!(flags & VM_ALLOW_HUGE_VMAP))
+ return 0;

nr_contig = num_pages_contiguous(&pages[idx], max_steps);
if (nr_contig < 2)
@@ -3583,7 +3585,7 @@ static inline int get_vmap_batch_order(struct
page **pages,
}

static int vmap_batched(unsigned long addr, unsigned long end,
- pgprot_t prot, struct page **pages)
+ unsigned long flags, pgprot_t prot, struct page **pages)
{
unsigned int count = (end - addr) >> PAGE_SHIFT;
unsigned int prev_shift = 0, idx = 0;
@@ -3597,7 +3599,7 @@ static int vmap_batched(unsigned long addr,
unsigned long end,

for (unsigned int i = 0; i < count; ) {
unsigned int shift = PAGE_SHIFT +
- get_vmap_batch_order(pages, prot, count - i, i);
+ get_vmap_batch_order(pages, flags, prot, count - i, i);

if (!i)
prev_shift = shift;
@@ -3711,7 +3713,7 @@ void *vmap(struct page **pages, unsigned int count,
return NULL;

addr = (unsigned long)area->addr;
- if (vmap_batched(addr, addr + size, pgprot_nx(prot),
+ if (vmap_batched(addr, addr + size, flags, pgprot_nx(prot),
pages) < 0) {
vunmap(area->addr);
return NULL;

I feel it's unnecessary, given that kvmalloc() already
always uses VM_ALLOW_HUGE_VMAP which will fail
set_memory_xxx() as you mentioned.
kvmalloc() is already a generic memory allocation API.

/*
* kvmalloc() can always use VM_ALLOW_HUGE_VMAP,
* since the callers already cannot assume anything
* about the resulting pointer, and cannot play
* protection games.
*/
return __vmalloc_node_range_noprof(size, align, VMALLOC_START,
VMALLOC_END,
flags, PAGE_KERNEL, allow_block ? VM_ALLOW_HUGE_VMAP:0,
node, __builtin_return_address(0));

Best Regards
Barry