Re: [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated

From: Joonsoo Kim
Date: Thu Mar 17 2016 - 13:18:36 EST


2016-03-17 1:49 GMT+09:00 Lucas Stach <l.stach@xxxxxxxxxxxxxx>:
> This fixes an annoying race in the CMA code leading to lots of "PFNs busy"
> messages when CMA is used concurrently. This is harmless normally as CMA
> will just retry the allocation at a different place, but it might lead to
> increased fragmentation of the CMA area as well as failing allocations
> when CMA is under memory pressure.
>
> The issue is that test_pages_isolated checks if the range is free by
> checking that all pages in the range are buddy pages. For this to work
> the start pfn needs to be aligned to the higher order buddy page
> including the start pfn if there is any.
>
> This is not a problem for the memory hotplug code, as it always offlines
> whole pageblocks, but CMA may want to isolate a smaller range. So for
> the check to work correctly it down-aligns the start pfn to the higher
> order buddy page. As the zone is not yet locked at that point a
> concurrent page free might coalesce the pages to be checked into an
> even bigger buddy page, causing the check to fail, while all pages are
> in fact buddy pages.
>
> By moving the zone locking to the caller of the test function, it's
> possible to do it before CMA tries to find the proper start page and stop
> any concurrent page coalescing to happen until the check is finished.

I think that this patch cannot prevent the same race on
isolate_freepages_range(). If buddy merging happens after we
passed test_pages_isolated(), isolate_freepages_range() cannot see
buddy page and will fail.

Thanks.