Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

Next message: Paul Moore: "Re: Suspected off-by-one in context_struct_to_string()"
Previous message: Jeff Johnson: "Re: [PATCH v2 04/14] wifi: ath10k: snoc: support powering on the device via pwrseq"
In reply to: David Hildenbrand (Red Hat): "Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism"
Next in thread: Ankur Arora: "Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Ankur Arora

Date: Thu Jan 15 2026 - 17:32:31 EST

David Hildenbrand (Red Hat) <david@xxxxxxxxxx> writes:

> On 1/15/26 21:16, dan.j.williams@xxxxxxxxx wrote:
>> David Hildenbrand (Red Hat) wrote:
>> [..]
>>>> Give me a list of 1Gig pages and this stuff becomes much more efficient
>>>> than anything the CPU can do.
>>>
>>> Right, and ideally we'd implement any such mechanisms in a way that more
>>> parts of the kernel can benefit, and not just an unloved in-memory
>>> file-system that most people just want to get rid of as soon as we can :)
>> CPUs have tended to eat the value of simple DMA offload operations like
>> copy/zero over time.
>> In the case of this patch there is no async-offload benefit because
>> userspace is already charged with spawning more threads if it wants more
>> parallelism.
>
> In this subthread we're discussing handling that in the kernel like
> init_on_free. So when user space frees a hugetlb folio (or in the
> future, other similarly gigantic folios from another allocator), we'd be zeroing
> it.
>
> If it would be freeing multiple such folios, we could pack them and send them to
> a DMA engine to zero them for us (concurrently? asynchronously? I don't know :)
> )

I've been thinking about using non-temporal instructions (movnt/clzero)
for zeroing in that path.

Both the DMA engine and non-temporal zeroing would also improve things
because we won't be bringing free buffers to the cache while zeroing.

--
ankur