Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

From: David Hildenbrand (Red Hat)

Date: Wed Jan 14 2026 - 05:41:52 EST


On 1/13/26 13:41, Li Zhe wrote:
On Tue, 13 Jan 2026 11:15:29 +0100, david@xxxxxxxxxx wrote:

On 1/13/26 07:37, Li Zhe wrote:
On Mon, 12 Jan 2026 20:52:12 +0100, david@xxxxxxxxxx wrote:

As for concern (4), I believe it is orthogonal to this patchset, and
the cover letter already contains a performance comparison that
demonstrates the additional benefit.

I did see some comments in [1] about QEMU supporting user-mode
parallel zero-page operations; I'm just not sure what the current
state of that support looks like, or what the corresponding benchmark
numbers are.

As noted above, QEMU already employs a parallel page-touch mechanism,
yet the elapsed time remains noticeable. I am not deeply familiar with
QEMU; please correct me if I am mistaken.

I implemented some part of the parallel preallocation support in QEMU.

With QEMU, you can specify the number of threads and even specify the
NUMA-placement of these threads. So you can pretty much fine-tune that
for an environment.

You still pre-zero all hugetlb pages at VM startup time, just in
parallel though. So you pay some price at APP startup time.

Hi David,

Thank you for the comprehensive explanation.

You are absolutely correct: QEMU's parallel preallocation is performed
only during VM start-up. We submitted this patch series mainly
because we observed that, even with the existing parallel mechanism,
launching large-size VMs still incurs prohibitive delays. (Bringing up
a 2 TB VM still requires more than 40 seconds for zeroing)

If you know that you will run such a VM (or something else) later, you
could pre-zero the memory from user space by using a hugetlb-backed file
and supplying that to QEMU as memory backend for the VM. Then, you can
start your VM without any pre-zeroing.

I guess that approach should work universally. Of course, there are
limitations, as you would have to know how much memory an app needs, and
have a way to supply that memory in form of a file to that app.

Regarding user-space pre-zeroing, I agree that it is feasible once the
VM's memory footprint is known. We evaluated this approach internally;
however, in production environments, it is almost impossible to predict
the exact amount of memory a VM will require.

Of course, you could preallocate to the expected maximum and then
truncate the file to the size you need :)

The solution you described seems similar to delegating hugepage
management to a userspace daemon. I haven't explored this approach
before, but it appears quite complex. Beyond ensuring secure memory
isolation between VMs, we would also need to handle scenarios where
the management daemon or the QEMU process crashes, which implies
implementing robust recovery and memory reclamation mechanisms.

Yes, but I don't think that's particularly complicated. You have to remove the backing file, yes.

Do
you happen to have any documentation or references regarding
userspace hugepage management that I could look into?

Not really any documentation. I pretty much only know how QEMU+libvirt ends up using it :)

Compared to
the userspace approach, I wonder if implementing hugepage
pre-zeroing directly within the kernel would be a simpler and more
direct way to accelerate VM creation.

I mean, yes. I don't particularly enjoy user-space having to poll for pre-zeroing of pages ... it feels like an odd interface for something that is supposed to be simple.

I do understand the reasoning that "zeroing must be charged to somebody", and that using a kthread is a bit suboptimal as well.


Here is a thought: with "init_on_free", we charge zeroing of pages to whoever frees a page.

Can't we have a hugetlb mode where we zero hugetlb folios as they are getting freed back to the hugetlb allcoator? IOW, we charge it to whoever puts the last reference.

just a thought, maybe it was discussed before ...

--
Cheers

David