Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated

From: Mike Kravetz
Date: Thu Feb 15 2018 - 19:41:08 EST


On 02/12/2018 02:20 PM, Mike Kravetz wrote:
> start_isolate_page_range() is used to set the migrate type of a
> page block to MIGRATE_ISOLATE while attempting to start a
> migration operation. It is assumed that only one thread is
> attempting such an operation, and due to the limited number of
> callers this is generally the case. However, there are no
> guarantees and it is 'possible' for two threads to operate on
> the same range.

I confirmed my suspicions that this is possible today.

As a test, I created a large CMA area at boot time. I wrote some
code to exercise large allocations and frees via cma_alloc()/cma_release().
At the same time, I just allocated and freed'ed gigantic pages via the
sysfs interface.

After a little bit of running, 'free memory' on the system went to
zero. After 'stopping' the tests, I observed that most zone normal
page blocks were marked as MIGRATE_ISOLATE. Hence 'not available'.

As mentioned in the commit message, I doubt we will see this is
normal operations. But, my testing confirms that it is possible.
Therefore, we should consider a patch like this or some other form
of mitigation even of we don't move forward with adding the new
interface.

--
Mike Kravetz

>
> Since start_isolate_page_range() is called at the beginning of
> such operations, have it return -EBUSY if MIGRATE_ISOLATE is
> already set.
>
> This will allow start_isolate_page_range to serve as a
> synchronization mechanism and will allow for more general use
> of callers making use of these interfaces.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> ---
> mm/page_alloc.c | 8 ++++----
> mm/page_isolation.c | 10 +++++++++-
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 76c9688b6a0a..064458f317bf 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
> * @gfp_mask: GFP mask to use during compaction
> *
> * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
> - * aligned, however it's the caller's responsibility to guarantee that
> - * we are the only thread that changes migrate type of pageblocks the
> - * pages fall in.
> + * aligned. The PFN range must belong to a single zone.
> *
> - * The PFN range must belong to a single zone.
> + * The first thing this routine does is attempt to MIGRATE_ISOLATE all
> + * pageblocks in the range. Once isolated, the pageblocks should not
> + * be modified by others.
> *
> * Returns zero on success or negative error code. On success all
> * pages which PFN is in [start, end) are allocated for the caller and
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 165ed8117bd1..e815879d525f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
>
> spin_lock_irqsave(&zone->lock, flags);
>
> + /*
> + * We assume we are the only ones trying to isolate this block.
> + * If MIGRATE_ISOLATE already set, return -EBUSY
> + */
> + if (is_migrate_isolate_page(page))
> + goto out;
> +
> pfn = page_to_pfn(page);
> arg.start_pfn = pfn;
> arg.nr_pages = pageblock_nr_pages;
> @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
> * future will not be allocated again.
> *
> * start_pfn/end_pfn must be aligned to pageblock_order.
> - * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
> + * Returns 0 on success and -EBUSY if any part of range cannot be isolated
> + * or any part of the range is already set to MIGRATE_ISOLATE.
> */
> int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> unsigned migratetype, bool skip_hwpoisoned_pages)
>