Re: [PATCH v9 3/4] mm/vmalloc: free unused pages on vrealloc() shrink

From: Alice Ryhl

Date: Wed Apr 01 2026 - 17:23:07 EST

On Wed, Apr 01, 2026 at 10:46:35PM +0530, Shivam Kalra via B4 Relay wrote:
> From: Shivam Kalra <shivamkalra98@xxxxxxxxxxx>
>
> When vrealloc() shrinks an allocation and the new size crosses a page
> boundary, unmap and free the tail pages that are no longer needed. This
> reclaims physical memory that was previously wasted for the lifetime
> of the allocation.
>
> The heuristic is simple: always free when at least one full page becomes
> unused. Huge page allocations (page_order > 0) are skipped, as partial
> freeing would require splitting. Allocations with VM_FLUSH_RESET_PERMS
> are also skipped, as their direct-map permissions must be reset before
> pages are returned to the page allocator, which is handled by
> vm_reset_perms() during vfree().
>
> Additionally, allocations with VM_USERMAP are skipped because
> remap_vmalloc_range_partial() validates mapping requests against the
> unchanged vm->size; freeing tail pages would cause vmalloc_to_page()
> to return NULL for the unmapped range.
>
> To protect concurrent readers, the shrink path uses Node lock to
> synchronize before freeing the pages.
>
> Finally, we notify kmemleak of the reduced allocation size using
> kmemleak_free_part() to prevent the kmemleak scanner from faulting on
> the newly unmapped virtual addresses.
>
> The virtual address reservation (vm->size / vmap_area) is intentionally
> kept unchanged, preserving the address for potential future grow-in-place
> support.
>
> Suggested-by: Danilo Krummrich <dakr@xxxxxxxxxx>
> Signed-off-by: Shivam Kalra <shivamkalra98@xxxxxxxxxxx>
> ---
> mm/vmalloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 52 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 1c6d747220ce..a7731e54560b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4359,14 +4359,62 @@ void *vrealloc_node_align_noprof(const void *p, size_t size, unsigned long align
> goto need_realloc;
> }
>
> - /*
> - * TODO: Shrink the vm_area, i.e. unmap and free unused pages. What
> - * would be a good heuristic for when to shrink the vm_area?
> - */
> if (size <= old_size) {
> + unsigned int new_nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> /* Zero out "freed" memory, potentially for future realloc. */
> if (want_init_on_free() || want_init_on_alloc(flags))
> memset((void *)p + size, 0, old_size - size);
> +
> + /*
> + * Free tail pages when shrink crosses a page boundary.
> + *
> + * Skip huge page allocations (page_order > 0) as partial
> + * freeing would require splitting.
> + *
> + * Skip VM_FLUSH_RESET_PERMS, as direct-map permissions must
> + * be reset before pages are returned to the allocator.
> + *
> + * Skip VM_USERMAP, as remap_vmalloc_range_partial() validates
> + * mapping requests against the unchanged vm->size; freeing
> + * tail pages would cause vmalloc_to_page() to return NULL for
> + * the unmapped range.
> + *
> + * Skip if either GFP_NOFS or GFP_NOIO are used.
> + * kmemleak_free_part() internally allocates with
> + * GFP_KERNEL, which could trigger a recursive deadlock
> + * if we are under filesystem or I/O reclaim.
> + */
> + if (new_nr_pages < vm->nr_pages && !vm_area_page_order(vm) &&
> + !(vm->flags & (VM_FLUSH_RESET_PERMS | VM_USERMAP)) &&
> + gfp_has_io_fs(flags)) {
> + unsigned long addr = (unsigned long)kasan_reset_tag(p);
> + unsigned int old_nr_pages = vm->nr_pages;
> +
> + /* Notify kmemleak of the reduced allocation size before unmapping. */
> + kmemleak_free_part(
> + (void *)addr + ((unsigned long)new_nr_pages
> + << PAGE_SHIFT),
> + (unsigned long)(old_nr_pages - new_nr_pages)
> + << PAGE_SHIFT);
> +
> + vunmap_range(addr + ((unsigned long)new_nr_pages
> + << PAGE_SHIFT),
> + addr + ((unsigned long)old_nr_pages
> + << PAGE_SHIFT));
> +
> + /*
> + * Use the node lock to synchronize with concurrent
> + * readers (vmalloc_info_show).
> + */
> + struct vmap_node *vn = addr_to_node(addr);
> +
> + spin_lock(&vn->busy.lock);
> + vm->nr_pages = new_nr_pages;
> + spin_unlock(&vn->busy.lock);

Should we set nr_pages first? Right now, another thread may observe the
range being unmapped but still see the old nr_pages value.

> + vm_area_free_pages(vm, new_nr_pages, old_nr_pages);
> + }
> vm->requested_size = size;
> kasan_vrealloc(p, old_size, size);
> return (void *)p;
>
> --
> 2.43.0
>
>