Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

From: David Laight

Date: Tue Jan 20 2026 - 14:30:31 EST


On Tue, 20 Jan 2026 13:18:19 -0500
Gregory Price <gourry@xxxxxxxxxx> wrote:

> On Tue, Jan 20, 2026 at 06:39:48PM +0800, Li Zhe wrote:
> > On Tue, 20 Jan 2026 09:47:44 +0000, david.laight.linux@xxxxxxxxx wrote:
> >
> > > On Tue, 20 Jan 2026 14:27:06 +0800
> > > "Li Zhe" <lizhe.67@xxxxxxxxxxxxx> wrote:
> > >
> > > > In light of the preceding discussion, we appear to have reached the
> > > > following understanding:
> > > >
> > > > (1) At present we prefer to mitigate slow application startup (e.g.,
> > > > VM creation) by zeroing huge pages at the moment they are freed
> > > > (init_on_free). The principal benefit is that user space gains the
> > > > performance improvement without deploying any additional user space
> > > > daemon.
> > >
> > > Am I missing something?
> > > If userspace does:
> > > $ program_a; program_b
> > > and pages used by program_a are zeroed when it exits you get the delay
> > > for zeroing all the pages it used before program_b starts.
> > > OTOH if the zeroing is deferred program_b only needs to zero the pages
> > > it needs to start (and there may be some lurking).
> >
> > Under the init_on-free approach, improving the speed of zeroing may
> > indeed prove necessary.
> >
> > However, I believe we should first reach consensus on adopting
> > “init_on_free” as the solution to slow application startup before
> > turning to performance tuning.
> >
>
> His point was init_on_free may not actually reduce any delays on serial
> applications, and can actually introduce additional delays.
>
> Example
> -------
> program_a: alloc_hugepages(10);
> exit();
>
> program b: alloc_hugepages(5);
> exit();
>
> /* Run programs in serial */
> sh: program_a && program_b
>
> in zero_on_alloc():
> program_a eats zero(10) cost on startup
> program_b eats zero(5) cost on startup
> Overall zero(15) cost to start program_b
>
> in zero_on_free()
> program_a eats zero(10) cost on startup

Do you get that cost? - wont all the unused memory be zeros.

> program_a eats zero(10) cost on exit
> program_b eats zero(0) cost on startup
> Overall zero(20) cost to start program_b
>
> zero_on_free is worse by zero(5)
> -------
>
> This is a trivial example, but it's unclear zero_on_free actually
> provides a benefit. You have to know ahead of time what the runtime
> behavior, pre-zeroed count, and allocation pattern (0->10->5->...) would
> be to determine whether there's an actual reduction in startup time.
>
> But just trivially, starting from the base case of no pages being
> zeroed, you're just injecting an additional zero(X) cost if program_a()
> consumes more hugepages than program_b().

I'd consider a different test:
for c in $(jot 1 1000); do program_a; done

Regardless of whether you zero on alloc or free all the zeroing is in line.
Move it to a low priority thread (that uses a non-aggressive loop) and
there will be reasonable chance of there being pre-zeroed pages available.
(Most DMA is far too aggressive...)

If you zero on free it might also be a waste of time.
Maybe the memory is next used to read data from a disk file.

David

>
> Long way of saying the shift from alloc to free seems heuristic-y and
> you need stronger analysis / better data to show this change is actually
> beneficial in the general case.
>
> ~Gregory