On Thu, Dec 01, 2016 at 02:41:29PM +0100, Vlastimil Babka wrote:
On 12/01/2016 01:24 AM, Mel Gorman wrote:
...
Hmm I think that if this hits, we don't decrease count/increase nr_freed and
pcp->count will become wrong.
Ok, I think you're right but I also think it's relatively trivial to fix
with
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 94808f565f74..8777aefc1b8e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1134,13 +1134,13 @@ static void free_pcppages_bulk(struct zone *zone, int count,
if (unlikely(isolated_pageblocks))
mt = get_pageblock_migratetype(page);
+ nr_freed += (1 << order);
+ count -= (1 << order);
if (bulkfree_pcp_prepare(page))
continue;
__free_one_page(page, page_to_pfn(page), zone, order, mt);
trace_mm_page_pcpu_drain(page, order, mt);
- nr_freed += (1 << order);
- count -= (1 << order);
} while (count > 0 && --batch_free && !list_empty(list));
}
spin_unlock(&zone->lock);
And if we are unlucky/doing full drain, all
lists will get empty, but as count stays e.g. 1, we loop forever on the
outer while()?
Potentially yes. Granted the system is already in a bad state as pages
are being freed in a bad or unknown state but we haven't halted the
system for that in the past.
BTW, I think there's a similar problem (but not introduced by this patch) in
rmqueue_bulk() and its
if (unlikely(check_pcp_refill(page)))
continue;
Potentially yes. It's outside the scope of this patch but it needs
fixing.
If you agree with the above fix, I'll roll it into a v5 and append
another patch for this issue.