Re: [PATCH 28/28] mm, page_alloc: Defer debugging checks of pages allocated from the PCP

From: Mel Gorman
Date: Wed May 18 2016 - 04:49:44 EST


On Wed, May 18, 2016 at 09:51:58AM +0200, Vlastimil Babka wrote:
> On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
> >> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
> >> struct list_head *list;
> >>
> >> local_irq_save(flags);
> >> - pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> - list = &pcp->lists[migratetype];
> >> - if (list_empty(list)) {
> >> - pcp->count += rmqueue_bulk(zone, 0,
> >> - pcp->batch, list,
> >> - migratetype, cold);
> >> - if (unlikely(list_empty(list)))
> >> - goto failed;
> >> - }
> >> + do {
> >> + pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> + list = &pcp->lists[migratetype];
> >> + if (list_empty(list)) {
> >> + pcp->count += rmqueue_bulk(zone, 0,
> >> + pcp->batch, list,
> >> + migratetype, cold);
> >> + if (unlikely(list_empty(list)))
> >> + goto failed;
> >> + }
> >>
> >> - if (cold)
> >> - page = list_last_entry(list, struct page, lru);
> >> - else
> >> - page = list_first_entry(list, struct page, lru);
> >> + if (cold)
> >> + page = list_last_entry(list, struct page, lru);
> >> + else
> >> + page = list_first_entry(list, struct page, lru);
> >> + } while (page && check_new_pcp(page));
> >
> > This causes infinite loop when check_new_pcp() returns 1, because the bad
> > page is still in the list (I assume that a bad page never disappears).
> > The original kernel is free from this problem because we do retry after
> > list_del(). So moving the following 3 lines into this do-while block solves
> > the problem?
> >
> > __dec_zone_state(zone, NR_ALLOC_BATCH);
> > list_del(&page->lru);
> > pcp->count--;
> >
> > There seems no infinit loop issue in order > 0 block below, because bad pages
> > are deleted from free list in __rmqueue_smallest().
>
> Ooops, thanks for catching this, wish it was sooner...
>

Still not too late fortunately! Thanks Naoya for identifying this and
Vlastimil for fixing it.

> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@xxxxxxx>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
>
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it from
> the list which was just tested for being non-empty.
>
> Fixes: http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
> Reported-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>

Reviewed-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

--
Mel Gorman
SUSE Labs