Re: [PATCH 2/5] mm/page_alloc: Add a bulk page allocator

From: Mel Gorman
Date: Fri Mar 12 2021 - 11:05:07 EST


On Fri, Mar 12, 2021 at 02:58:14PM +0000, Matthew Wilcox wrote:
> On Fri, Mar 12, 2021 at 12:46:09PM +0100, Jesper Dangaard Brouer wrote:
> > In my page_pool patch I'm bulk allocating 64 pages. I wanted to ask if
> > this is too much? (PP_ALLOC_CACHE_REFILL=64).
> >
> > The mlx5 driver have a while loop for allocation 64 pages, which it
> > used in this case, that is why 64 is chosen. If we choose a lower
> > bulk number, then the bulk-alloc will just be called more times.
>
> The thing about batching is that smaller batches are often better.
> Let's suppose you need to allocate 100 pages for something, and the page
> allocator takes up 90% of your latency budget. Batching just ten pages
> at a time is going to reduce the overhead to 9%. Going to 64 pages
> reduces the overhead from 9% to 2% -- maybe that's important, but
> possibly not.
>

I do not think that something like that can be properly accessed in
advance. It heavily depends on whether the caller is willing to amortise
the cost of the batch allocation or if the timing of the bulk request is
critical every single time.

> > The result of the API is to deliver pages as a double-linked list via
> > LRU (page->lru member). If you are planning to use llist, then how to
> > handle this API change later?
> >
> > Have you notice that the two users store the struct-page pointers in an
> > array? We could have the caller provide the array to store struct-page
> > pointers, like we do with kmem_cache_alloc_bulk API.
>
> My preference would be for a pagevec. That does limit you to 15 pages
> per call [1], but I do think that might be enough. And the overhead of
> manipulating a linked list isn't free.
>

I'm opposed to a pagevec because it unnecessarily limits the caller. The
sunrpc user for example knows how many pages it needs at the time the bulk
allocator is called but it's not the same value every time. When tracing,
I found it sometimes requested 1 page (most common request actually) and
other times requested 200+ pages. Forcing it to call the batch allocator
in chunks of 15 means the caller incurs the cost of multiple allocation
requests which is almost as bad as calling __alloc_pages in a loop.

I think the first version should have an easy API to start with. Optimise
the implementation if it is a bottleneck. Only make the API harder to
use if the callers are really willing to always allocate and size the
array in advance and it's shown that it really makes a big difference
performance-wise.

--
Mel Gorman
SUSE Labs