Re: revisiting alloc_pages_bulks semantics?
From: Chuck Lever
Date: Wed May 27 2026 - 09:59:33 EST
On 5/27/26 8:19 AM, Christoph Hellwig wrote:
> On Wed, May 27, 2026 at 12:06:08PM +0200, Vlastimil Babka (SUSE) wrote:
>>> alloc_pages_bulks can do partial allocations for some reasons, and
>>> users usually have a fallback by either looping and calling it again
>>> or falling back to single page allocations. This sucks! Why can't
>>> we get our usual try as hard as you can semantics, requiring
>>> GFP_NORETRY or similar to relax it?
>>
>> If we do that, do we keep the possibility of partial success, i.e. return
>> how many were allocated? Seems wasteful to suceed N-1 and then throw all
>> away, if the caller can use a fallback only for the last one.
>> Do some callers need all-or-nothing semantics? Should a flag indicate which
>> one to use?
>
> A lot of callers (but not all) need all or nothing semantics. But
> freeing already allocated pages is the not a major problem - the caller
> just has to add a release_pages call if it didn't already have one
> for cleaning up later failures.
What the svc/nfsd thread is trying to avoid is sleeping uninterruptibly
waiting for memory resources. That stalls server shutdown, among other
things.
Guaranteeing forward progress is the bottom line.
>>> There is one single user (svc_fill_pages in sunrpc) that relies on it.
>>> For everyone else it creates extra burden and is very error prone
>>> (speaking from experience).
>>
>> Sounds good to me. Will sunrpc be easy to convert, or should it be another
>> flag to opt-in to the current behavior, that it would use?
>
> I've added Chuck to the Cc list, but from memory sunrpc actually does
> make use of this feature and he objected to previous attempts to
> change it. So a first step would be to have a lower-level helper
> that works as-is and a wrapper that zeroes the array, even if that
> doesn't feel as efficient as it could be.
If sunrpc is the only user, it might be sensible to hoist the "zero
fill" capability into sunrpc.ko.
The impact of walking the whole array, for this check, is measurable,
and I've got a patch in 7.1 to mitigate that:
commit d7f3efd9ff474867b04e1ea784690f02450a245b
(refs/patches/nfsd-fixes/sunrpc-optimize-rq_respages-allocation-in-svc_alloc_arg)
Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
AuthorDate: Thu Feb 26 09:47:39 2026 -0500
Commit: Chuck Lever <chuck.lever@xxxxxxxxxx>
CommitDate: Sun Mar 29 21:25:09 2026 -0400
SUNRPC: Optimize rq_respages allocation in svc_alloc_arg
svc_alloc_arg() invokes alloc_pages_bulk() with the full rq_maxpages
count (~259 for 1MB messages) for the rq_respages array, causing a
full-array scan despite most slots holding valid pages.
svc_rqst_release_pages() NULLs only the range
[rq_respages, rq_next_page)
after each RPC, so only that range contains NULL entries. Limit the
rq_respages fill in svc_alloc_arg() to that range instead of
scanning the full array.
svc_init_buffer() initializes rq_next_page to span the entire
rq_respages array, so the first svc_alloc_arg() call fills all
slots.
Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
--
Chuck Lever