Re: [PATCH v4 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free

From: Vlastimil Babka
Date: Mon Mar 12 2018 - 10:23:01 EST


On 03/01/2018 07:28 AM, Aaron Lu wrote:
> When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy,
> the zone->lock is held and then pages are chosen from PCP's migratetype
> list. While there is actually no need to do this 'choose part' under
> lock since it's PCP pages, the only CPU that can touch them is us and
> irq is also disabled.
>
> Moving this part outside could reduce lock held time and improve
> performance. Test with will-it-scale/page_fault1 full load:
>
> kernel Broadwell(2S) Skylake(2S) Broadwell(4S) Skylake(4S)
> v4.16-rc2+ 9034215 7971818 13667135 15677465
> this patch 9536374 +5.6% 8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
>
> What the test does is: starts $nr_cpu processes and each will repeatedly
> do the following for 5 minutes:
> 1 mmap 128M anonymouse space;
> 2 write access to that space;
> 3 munmap.
> The score is the aggregated iteration.
>
> https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c
>
> Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx>
> ---
> mm/page_alloc.c | 39 +++++++++++++++++++++++----------------
> 1 file changed, 23 insertions(+), 16 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index faa33eac1635..dafdcdec9c1f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1116,12 +1116,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> int migratetype = 0;
> int batch_free = 0;
> bool isolated_pageblocks;
> -
> - spin_lock(&zone->lock);
> - isolated_pageblocks = has_isolate_pageblock(zone);
> + struct page *page, *tmp;
> + LIST_HEAD(head);
>
> while (count) {
> - struct page *page;
> struct list_head *list;
>
> /*
> @@ -1143,27 +1141,36 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> batch_free = count;
>
> do {
> - int mt; /* migratetype of the to-be-freed page */
> -
> page = list_last_entry(list, struct page, lru);
> - /* must delete as __free_one_page list manipulates */
> + /* must delete to avoid corrupting pcp list */
> list_del(&page->lru);

Well, since bulkfree_pcp_prepare() doesn't care about page->lru, you
could maybe use list_move_tail() instead of list_del() +
list_add_tail()? That avoids temporarily writing poison values.

Hm actually, you are reversing the list in the process, because page is
obtained by list_last_entry and you use list_add_tail. That could have
unintended performance consequences?

Also maybe list_cut_position() could be faster than shuffling pages one
by one? I guess not really, because batch_free will be generally low?

> pcp->count--;
>
> - mt = get_pcppage_migratetype(page);
> - /* MIGRATE_ISOLATE page should not go to pcplists */
> - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> - /* Pageblock could have been isolated meanwhile */
> - if (unlikely(isolated_pageblocks))
> - mt = get_pageblock_migratetype(page);
> -
> if (bulkfree_pcp_prepare(page))
> continue;
>
> - __free_one_page(page, page_to_pfn(page), zone, 0, mt);
> - trace_mm_page_pcpu_drain(page, 0, mt);
> + list_add_tail(&page->lru, &head);
> } while (--count && --batch_free && !list_empty(list));
> }
> +
> + spin_lock(&zone->lock);
> + isolated_pageblocks = has_isolate_pageblock(zone);
> +
> + /*
> + * Use safe version since after __free_one_page(),
> + * page->lru.next will not point to original list.
> + */
> + list_for_each_entry_safe(page, tmp, &head, lru) {
> + int mt = get_pcppage_migratetype(page);
> + /* MIGRATE_ISOLATE page should not go to pcplists */
> + VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> + /* Pageblock could have been isolated meanwhile */
> + if (unlikely(isolated_pageblocks))
> + mt = get_pageblock_migratetype(page);
> +
> + __free_one_page(page, page_to_pfn(page), zone, 0, mt);
> + trace_mm_page_pcpu_drain(page, 0, mt);
> + }
> spin_unlock(&zone->lock);
> }
>
>