Re: [PATCH RFC v3 01/19] mm: thread user_addr through page allocator for cache-friendly zeroing

From: Michael S. Tsirkin

Date: Wed Apr 22 2026 - 17:20:44 EST

On Wed, Apr 22, 2026 at 03:47:07PM -0400, Gregory Price wrote:
> On Tue, Apr 21, 2026 at 06:01:10PM -0400, Michael S. Tsirkin wrote:
> > Thread a user virtual address from vma_alloc_folio() down through
> > the page allocator to post_alloc_hook(). This is plumbing preparation
> > for a subsequent patch that will use user_addr to call folio_zero_user()
> > for cache-friendly zeroing of user pages.
> >
> > The user_addr is stored in struct alloc_context and flows through:
> > vma_alloc_folio -> folio_alloc_mpol -> __alloc_pages_mpol ->
> > __alloc_frozen_pages -> get_page_from_freelist -> prep_new_page ->
> > post_alloc_hook
> >
> > Public APIs (__alloc_pages, __folio_alloc, folio_alloc_mpol) gain a
> > user_addr parameter directly. Callers that do not need user_addr
> > pass USER_ADDR_NONE ((unsigned long)-1), since
> > address 0 is a valid user mapping.
> >
>
> Question: rather than churning the entirety of the existing interfaces,
> is there a possibility of adding an explicit interface for this
> interaction that amounts to:
>
> __alloc_user_pages(..., gfp_t gfp, user_addr)
> {
> BUG_ON(!(gfp & __GFP_ZERO));
>
> /* post_alloc_hook implements the already-zeroed skip */
> page = alloc_page(..., gfp, ...); /* existing interface */
>
> /* Do the cacheline stuff here instead of in the core */
> cacheline_nonsense(page, user_addr);
>
> return page; /* user doesn't need to do explicit zeroing */
> }
>
> Then rather than leaking information out of the buddy, we just need to
> get the zeroed information *into* the buddy.
>
> the users that want zeroing but need the explicit user_addr step just
> defer the zeroing to outside post_alloc_hook().
>
> That's just my immediate gut reaction to all this churn on the existing
> interfaces.
>
> Existing users can continue using the buddy as-is, and enlightened users
> can optimize for this specific kind of __GFP_ZERO interaction.
>
> ~Gregory

Hmm. Maybe I misunderstand what you propose, but this seems pretty close
to what v2 did - each callsite checked whether the page was pre-zeroed
and called folio_zero_user() itself. The feedback (both you and David)
was that threading it through the allocator is better.

With a wrapper approach, looks like we'd need something like
__GFP_SKIP_ZERO so post_alloc_hook doesn't zero sequentially, then the
wrapper re-zeros with folio_zero_user(). But then the wrapper needs to
know whether the page was pre-zeroed (PG_zeroed), which is cleared by
post_alloc_hook before return. So the information doesn't survive to
the wrapper.

We could return the zeroed hint via an output parameter, but that's
what v2's pghint_t was, and it was disliked.

The user_addr threading through the allocator does add API churn,
but it's all mechanical (adding one parameter, callers pass
USER_ADDR_NONE), any mistaked are just build errors.

And it makes the zeroing path closer to being correct by
construction: every allocation either explicitly
says no address or has a user_addr - and then gets
cache-friendly zeroing or skip-if-prezeroed, with no possibility
of a callsite forgetting to handle it.

Fundamentally, David told me I need to move folio_zero_user into
post_alloc_hook as a prerequisite to the optimization, so I did that -
let's stick to it then, shall we?

This approach also fixes a pre-existing double-zeroing on architectures with
aliasing data caches + init_on_alloc, where current code zeros once
via kernel_init_pages() then again via clear_user_highpage() at
the callsite. I don't see how that would be possible with the wrapper.

--
MST