Re: [PATCH v3] mm/page_alloc: replace kernel_init_pages() with batch page clearing

From: David Hildenbrand (Arm)

Date: Wed Apr 22 2026 - 14:27:13 EST


On 4/22/26 12:26, Hrushikesh Salunke wrote:
> When init_on_alloc is enabled, kernel_init_pages() clears every page
> one at a time via clear_highpage_kasan_tagged(), which incurs per-page
> kmap_local_page()/kunmap_local() overhead and prevents the architecture
> clearing primitive from operating on contiguous ranges.
>
> Introduce clear_highpages_kasan_tagged() in highmem.h, a batch
> clearing helper that calls clear_pages() for the full contiguous range
> on !HIGHMEM systems, bypassing the per-page kmap overhead and allowing
> a single invocation of the arch clearing primitive across the entire
> allocation. The HIGHMEM path falls back to per-page clearing since
> those pages require kmap.
>
> Replace kernel_init_pages() with direct calls to the new helper, as it
> becomes a trivial wrapper.
>
> Allocating 8192 x 2MB HugeTLB pages (16GB) with init_on_alloc=1:
>
> Before: 0.445s
> After: 0.166s (-62.7%, 2.68x faster)
>
> Kernel time (sys) reduction per workload with init_on_alloc=1:
>
> Workload Before After Change
> Graph500 64C128T 30m 41.8s 15m 14.8s -50.3%
> Graph500 16C32T 15m 56.7s 9m 43.7s -39.0%
> Pagerank 32T 1m 58.5s 1m 12.8s -38.5%
> Pagerank 128T 2m 36.3s 1m 40.4s -35.7%

We do have some elaborate handling in clear_contig_highpages() to chunk it up
(and to call cond_resched()). But that function can get called with much bigger
ranges.

I'm not concerned about the cond_resched() -- we wouldn't do one here before --
but I'm wondering whether we could end up triggering a HW instruction that is
uninterruptible and takes a rather long time.

But clear_contig_highpages() breaks it into 32MiB chunks, and only x86 supports
it so far. So we won't exceed that with the maximum buddy order of 4MiB on x86.

Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

--
Cheers,

David