Re: [PATCH 4/4] mm, page_alloc: Add a bulk page allocator

From: Hillf Danton
Date: Mon Jan 09 2017 - 23:00:55 EST


On Tuesday, January 10, 2017 12:35 AM Mel Gorman wrote:
>
> This patch adds a new page allocator interface via alloc_pages_bulk,
> __alloc_pages_bulk and __alloc_pages_bulk_nodemask. A caller requests a
> number of pages to be allocated and added to a list. They can be freed in
> bulk using free_pages_bulk(). Note that it would theoretically be possible
> to use free_hot_cold_page_list for faster frees if the symbol was exported,
> the refcounts were 0 and the caller guaranteed it was not in an interrupt.
> This would be significantly faster in the free path but also more unsafer
> and a harder API to use.
>
> The API is not guaranteed to return the requested number of pages and
> may fail if the preferred allocation zone has limited free memory, the
> cpuset changes during the allocation or page debugging decides to fail
> an allocation. It's up to the caller to request more pages in batch if
> necessary.
>
> The following compares the allocation cost per page for different batch
> sizes. The baseline is allocating them one at a time and it compares with
> the performance when using the new allocation interface.
>
> pagealloc
> 4.10.0-rc2 4.10.0-rc2
> one-at-a-time bulk-v2
> Amean alloc-odr0-1 259.54 ( 0.00%) 106.62 ( 58.92%)
> Amean alloc-odr0-2 193.38 ( 0.00%) 76.38 ( 60.50%)
> Amean alloc-odr0-4 162.38 ( 0.00%) 57.23 ( 64.76%)
> Amean alloc-odr0-8 144.31 ( 0.00%) 48.77 ( 66.20%)
> Amean alloc-odr0-16 134.08 ( 0.00%) 45.38 ( 66.15%)
> Amean alloc-odr0-32 128.62 ( 0.00%) 42.77 ( 66.75%)
> Amean alloc-odr0-64 126.00 ( 0.00%) 41.00 ( 67.46%)
> Amean alloc-odr0-128 125.00 ( 0.00%) 40.08 ( 67.94%)
> Amean alloc-odr0-256 136.62 ( 0.00%) 56.00 ( 59.01%)
> Amean alloc-odr0-512 152.00 ( 0.00%) 69.00 ( 54.61%)
> Amean alloc-odr0-1024 158.00 ( 0.00%) 76.23 ( 51.75%)
> Amean alloc-odr0-2048 163.00 ( 0.00%) 81.15 ( 50.21%)
> Amean alloc-odr0-4096 169.77 ( 0.00%) 85.92 ( 49.39%)
> Amean alloc-odr0-8192 170.00 ( 0.00%) 88.00 ( 48.24%)
> Amean alloc-odr0-16384 170.00 ( 0.00%) 89.00 ( 47.65%)
> Amean free-odr0-1 88.69 ( 0.00%) 55.69 ( 37.21%)
> Amean free-odr0-2 66.00 ( 0.00%) 49.38 ( 25.17%)
> Amean free-odr0-4 54.23 ( 0.00%) 45.38 ( 16.31%)
> Amean free-odr0-8 48.23 ( 0.00%) 44.23 ( 8.29%)
> Amean free-odr0-16 47.00 ( 0.00%) 45.00 ( 4.26%)
> Amean free-odr0-32 44.77 ( 0.00%) 43.92 ( 1.89%)
> Amean free-odr0-64 44.00 ( 0.00%) 43.00 ( 2.27%)
> Amean free-odr0-128 43.00 ( 0.00%) 43.00 ( 0.00%)
> Amean free-odr0-256 60.69 ( 0.00%) 60.46 ( 0.38%)
> Amean free-odr0-512 79.23 ( 0.00%) 76.00 ( 4.08%)
> Amean free-odr0-1024 86.00 ( 0.00%) 85.38 ( 0.72%)
> Amean free-odr0-2048 91.00 ( 0.00%) 91.23 ( -0.25%)
> Amean free-odr0-4096 94.85 ( 0.00%) 95.62 ( -0.81%)
> Amean free-odr0-8192 97.00 ( 0.00%) 97.00 ( 0.00%)
> Amean free-odr0-16384 98.00 ( 0.00%) 97.46 ( 0.55%)
> Amean total-odr0-1 348.23 ( 0.00%) 162.31 ( 53.39%)
> Amean total-odr0-2 259.38 ( 0.00%) 125.77 ( 51.51%)
> Amean total-odr0-4 216.62 ( 0.00%) 102.62 ( 52.63%)
> Amean total-odr0-8 192.54 ( 0.00%) 93.00 ( 51.70%)
> Amean total-odr0-16 181.08 ( 0.00%) 90.38 ( 50.08%)
> Amean total-odr0-32 173.38 ( 0.00%) 86.69 ( 50.00%)
> Amean total-odr0-64 170.00 ( 0.00%) 84.00 ( 50.59%)
> Amean total-odr0-128 168.00 ( 0.00%) 83.08 ( 50.55%)
> Amean total-odr0-256 197.31 ( 0.00%) 116.46 ( 40.97%)
> Amean total-odr0-512 231.23 ( 0.00%) 145.00 ( 37.29%)
> Amean total-odr0-1024 244.00 ( 0.00%) 161.62 ( 33.76%)
> Amean total-odr0-2048 254.00 ( 0.00%) 172.38 ( 32.13%)
> Amean total-odr0-4096 264.62 ( 0.00%) 181.54 ( 31.40%)
> Amean total-odr0-8192 267.00 ( 0.00%) 185.00 ( 30.71%)
> Amean total-odr0-16384 268.00 ( 0.00%) 186.46 ( 30.42%)
>
> It shows a roughly 50-60% reduction in the cost of allocating pages.
> The free paths are not improved as much but relatively little can be batched
> there. It's not quite as fast as it could be but taking further shortcuts
> would require making a lot of assumptions about the state of the page and
> the context of the caller.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> ---
Acked-by: Hillf Danton <hillf.zj@xxxxxxxxxxxxxxx>

> include/linux/gfp.h | 24 +++++++++++
> mm/page_alloc.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 139 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 4175dca4ac39..b2fe171ee1c4 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -433,6 +433,29 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order,
> return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL);
> }
>
> +unsigned long
> +__alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
> + struct zonelist *zonelist, nodemask_t *nodemask,
> + unsigned long nr_pages, struct list_head *alloc_list);
> +
> +static inline unsigned long
> +__alloc_pages_bulk(gfp_t gfp_mask, unsigned int order,
> + struct zonelist *zonelist, unsigned long nr_pages,
> + struct list_head *list)
> +{
> + return __alloc_pages_bulk_nodemask(gfp_mask, order, zonelist, NULL,
> + nr_pages, list);
> +}
> +
> +static inline unsigned long
> +alloc_pages_bulk(gfp_t gfp_mask, unsigned int order,
> + unsigned long nr_pages, struct list_head *list)
> +{
> + int nid = numa_mem_id();
> + return __alloc_pages_bulk(gfp_mask, order,
> + node_zonelist(nid, gfp_mask), nr_pages, list);
> +}
> +
> /*
> * Allocate pages, preferring the node given as nid. The node must be valid and
> * online. For more general interface, see alloc_pages_node().
> @@ -504,6 +527,7 @@ extern void __free_pages(struct page *page, unsigned int order);
> extern void free_pages(unsigned long addr, unsigned int order);
> extern void free_hot_cold_page(struct page *page, bool cold);
> extern void free_hot_cold_page_list(struct list_head *list, bool cold);
> +extern void free_pages_bulk(struct list_head *list);
>
> struct page_frag_cache;
> extern void __page_frag_drain(struct page *page, unsigned int order,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 232cadbe9231..4f142270fbf0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2485,7 +2485,7 @@ void free_hot_cold_page(struct page *page, bool cold)
> }
>
> /*
> - * Free a list of 0-order pages
> + * Free a list of 0-order pages whose reference count is already zero.
> */
> void free_hot_cold_page_list(struct list_head *list, bool cold)
> {
> @@ -2495,7 +2495,28 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
> trace_mm_page_free_batched(page, cold);
> free_hot_cold_page(page, cold);
> }
> +
> + INIT_LIST_HEAD(list);

Nit: can we cut this overhead off?
> +}
> +
> +/* Drop reference counts and free pages from a list */
> +void free_pages_bulk(struct list_head *list)
> +{
> + struct page *page, *next;
> + bool free_percpu = !in_interrupt();
> +
> + list_for_each_entry_safe(page, next, list, lru) {
> + trace_mm_page_free_batched(page, 0);
> + if (put_page_testzero(page)) {
> + list_del(&page->lru);
> + if (free_percpu)
> + free_hot_cold_page(page, false);
> + else
> + __free_pages_ok(page, 0);
> + }
> + }
> }
> +EXPORT_SYMBOL_GPL(free_pages_bulk);
>
> /*
> * split_page takes a non-compound higher-order page, and splits it into
> @@ -3887,6 +3908,99 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
> EXPORT_SYMBOL(__alloc_pages_nodemask);
>
> /*
> + * This is a batched version of the page allocator that attempts to
> + * allocate nr_pages quickly from the preferred zone and add them to list.
> + * Note that there is no guarantee that nr_pages will be allocated although
> + * every effort will be made to allocate at least one. Unlike the core
> + * allocator, no special effort is made to recover from transient
> + * failures caused by changes in cpusets. It should only be used from !IRQ
> + * context. An attempt to allocate a batch of patches from an interrupt
> + * will allocate a single page.
> + */
> +unsigned long
> +__alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
> + struct zonelist *zonelist, nodemask_t *nodemask,
> + unsigned long nr_pages, struct list_head *alloc_list)
> +{
> + struct page *page;
> + unsigned long alloced = 0;
> + unsigned int alloc_flags = ALLOC_WMARK_LOW;
> + struct zone *zone;
> + struct per_cpu_pages *pcp;
> + struct list_head *pcp_list;
> + int migratetype;
> + gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
> + struct alloc_context ac = { };
> + bool cold = ((gfp_mask & __GFP_COLD) != 0);
> +
> + /* If there are already pages on the list, don't bother */
> + if (!list_empty(alloc_list))
> + return 0;

Nit: can we move the check to the call site?
> +
> + /* Only handle bulk allocation of order-0 */
> + if (order || in_interrupt())
> + goto failed;

Ditto

Hillf