Re: [PATCH 4/4] mm, page_alloc: Add a bulk page allocator

From: Mel Gorman
Date: Tue Jan 10 2017 - 03:34:11 EST


On Tue, Jan 10, 2017 at 12:00:27PM +0800, Hillf Danton wrote:
> > It shows a roughly 50-60% reduction in the cost of allocating pages.
> > The free paths are not improved as much but relatively little can be batched
> > there. It's not quite as fast as it could be but taking further shortcuts
> > would require making a lot of assumptions about the state of the page and
> > the context of the caller.
> >
> > Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> > ---
> Acked-by: Hillf Danton <hillf.zj@xxxxxxxxxxxxxxx>
>

Thanks.

> > @@ -2485,7 +2485,7 @@ void free_hot_cold_page(struct page *page, bool cold)
> > }
> >
> > /*
> > - * Free a list of 0-order pages
> > + * Free a list of 0-order pages whose reference count is already zero.
> > */
> > void free_hot_cold_page_list(struct list_head *list, bool cold)
> > {
> > @@ -2495,7 +2495,28 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
> > trace_mm_page_free_batched(page, cold);
> > free_hot_cold_page(page, cold);
> > }
> > +
> > + INIT_LIST_HEAD(list);
>
> Nit: can we cut this overhead off?

Yes, but note that any caller of free_hot_cold_page_list() is then
required to reinit the list themselves or it'll cause corruption. It's
unlikely that a user of the bulk interface will handle the refcounts and
be able to use this interface properly but if they do, they need to
either reinit this or add the hunk back in.

It happens that all callers currently don't care.

> > /*
> > * split_page takes a non-compound higher-order page, and splits it into
> > @@ -3887,6 +3908,99 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> > EXPORT_SYMBOL(__alloc_pages_nodemask);
> >
> > /*
> > + * This is a batched version of the page allocator that attempts to
> > + * allocate nr_pages quickly from the preferred zone and add them to list.
> > + * Note that there is no guarantee that nr_pages will be allocated although
> > + * every effort will be made to allocate at least one. Unlike the core
> > + * allocator, no special effort is made to recover from transient
> > + * failures caused by changes in cpusets. It should only be used from !IRQ
> > + * context. An attempt to allocate a batch of patches from an interrupt
> > + * will allocate a single page.
> > + */
> > +unsigned long
> > +__alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
> > + struct zonelist *zonelist, nodemask_t *nodemask,
> > + unsigned long nr_pages, struct list_head *alloc_list)
> > +{
> > + struct page *page;
> > + unsigned long alloced = 0;
> > + unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > + struct zone *zone;
> > + struct per_cpu_pages *pcp;
> > + struct list_head *pcp_list;
> > + int migratetype;
> > + gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
> > + struct alloc_context ac = { };
> > + bool cold = ((gfp_mask & __GFP_COLD) != 0);
> > +
> > + /* If there are already pages on the list, don't bother */
> > + if (!list_empty(alloc_list))
> > + return 0;
>
> Nit: can we move the check to the call site?

Yes, but it makes the API slightly more hazardous to use.

> > +
> > + /* Only handle bulk allocation of order-0 */
> > + if (order || in_interrupt())
> > + goto failed;
>
> Ditto
>

Same, if the caller is in interrupt context, there is a slight risk that
they'll corrupt the list in a manner that will be tricky to catch. The
checks are to minimise the risk of being surprising.

--
Mel Gorman
SUSE Labs