Re: [PATCH] mm/hugetlb: optionally pre-zero hugetlb pages
From: Joao Martins
Date: Tue Dec 03 2024 - 09:32:22 EST
On 03/12/2024 12:06, Michal Hocko wrote:
> On Mon 02-12-24 14:50:49, Frank van der Linden wrote:
>> On Mon, Dec 2, 2024 at 1:58 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
>>> Any games with "background zeroing" are notoriously crappy and I would
>>> argue one should exhaust other avenues before going there -- at the end
>>> of the day the cost of zeroing will have to get paid.
>>
>> I understand that the concept of background prezeroing has been, and
>> will be, met with some resistance. But, do you have any specific
>> concerns with the patch I posted? It's pretty well isolated from the
>> rest of the code, and optional.
>
> The biggest concern I have is that the overhead is payed by everybody on
> the system - it is considered to be a system overhead regardless only
> part of the workload benefits from hugetlb pages. In other words the
> workload using those pages is not accounted for the use completely.
>
> If the startup latency is a real problem is there a way to workaround
> that in the userspace by preallocating hugetlb pages ahead of time
> before those VMs are launched and hand over already pre-allocated pages?
It should be relatively simple to actually do this. Me and Mike had experimented
ourselves a couple years back but we never had the chance to send it over. IIRC
if we:
- add the PageZeroed tracking bit when a page is zeroed
- clear it in the write (fixup/non-fixup) fault-path
[somewhat similar to this series I suspect]
Then what's left is to change the lookup of free hugetlb pages
(dequeue_hugetlb_folio_node_exact() I think) to search first for non-zeroed
pages. Provided we don't track its 'cleared' state, there's no UAPI change in
behaviour. A daemon can just allocate/mmap+touch/etc them with read-only and
free them back 'as zeroed' to implement a userspace scrubber. And in principle
existing apps should see no difference. The amount of changes is consequently
significantly smaller (or it looked as such in a quick PoC years back).
Something extra on the top would perhaps be the ability so select a lookup
heuristic such that we can pick the search method of
non-zero-first/only-nonzero/zeroed pages behind ioctl() (or a better generic
UAPI) to allow a scrubber to easily coexist with hugepage user (e.g. a VMM, etc)
without too much of a dance.
Joao