Re: [merged] mm-compaction-abort-free-scanner-if-split-fails.patch removed from -mm tree

From: Joonsoo Kim
Date: Wed Jun 29 2016 - 04:09:16 EST


On Tue, Jun 28, 2016 at 07:03:04PM -0700, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
>
> The patch titled
> Subject: mm, compaction: abort free scanner if split fails
> has been removed from the -mm tree. Its filename was
> mm-compaction-abort-free-scanner-if-split-fails.patch
>
> This patch was dropped because it was merged into mainline or a subsystem tree
>
> ------------------------------------------------------
> From: David Rientjes <rientjes@xxxxxxxxxx>
> Subject: mm, compaction: abort free scanner if split fails
>
> If the memory compaction free scanner cannot successfully split a free
> page (only possible due to per-zone low watermark), terminate the free
> scanner rather than continuing to scan memory needlessly. If the
> watermark is insufficient for a free page of order <= cc->order, then
> terminate the scanner since all future splits will also likely fail.
>
> This prevents the compaction freeing scanner from scanning all memory on
> very large zones (very noticeable for zones > 128GB, for instance) when
> all splits will likely fail while holding zone->lock.
>
> compaction_alloc() iterating a 128GB zone has been benchmarked to take
> over 400ms on some systems whereas any free page isolated and ready to
> be split ends up failing in split_free_page() because of the low
> watermark check and thus the iteration continues.
>
> The next time compaction occurs, the freeing scanner will likely start
> at the end of the zone again since no success was made previously and
> we get the same lengthy iteration until the zone is brought above the
> low watermark. All thp page faults can take >400ms in such a state
> without this fix.
>
> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@xxxxxxxxxxxxxxxxxxxxxxxxx
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
>
> mm/compaction.c | 39 +++++++++++++++++++++------------------
> 1 file changed, 21 insertions(+), 18 deletions(-)
>
> diff -puN mm/compaction.c~mm-compaction-abort-free-scanner-if-split-fails mm/compaction.c
> --- a/mm/compaction.c~mm-compaction-abort-free-scanner-if-split-fails
> +++ a/mm/compaction.c
> @@ -441,25 +441,23 @@ static unsigned long isolate_freepages_b
>
> /* Found a free page, break it into order-0 pages */
> isolated = split_free_page(page);
> + if (!isolated)
> + break;
> +
> total_isolated += isolated;
> + cc->nr_freepages += isolated;
> for (i = 0; i < isolated; i++) {
> list_add(&page->lru, freelist);
> page++;
> }
> -
> - /* If a page was split, advance to the end of it */
> - if (isolated) {
> - cc->nr_freepages += isolated;
> - if (!strict &&
> - cc->nr_migratepages <= cc->nr_freepages) {
> - blockpfn += isolated;
> - break;
> - }
> -
> - blockpfn += isolated - 1;
> - cursor += isolated - 1;
> - continue;
> + if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
> + blockpfn += isolated;
> + break;
> }
> + /* Advance to the end of split page */
> + blockpfn += isolated - 1;
> + cursor += isolated - 1;
> + continue;
>
> isolate_fail:
> if (strict)
> @@ -469,6 +467,9 @@ isolate_fail:
>
> }
>
> + if (locked)
> + spin_unlock_irqrestore(&cc->zone->lock, flags);
> +
> /*
> * There is a tiny chance that we have read bogus compound_order(),
> * so be careful to not go outside of the pageblock.
> @@ -490,9 +491,6 @@ isolate_fail:
> if (strict && blockpfn < end_pfn)
> total_isolated = 0;
>
> - if (locked)
> - spin_unlock_irqrestore(&cc->zone->lock, flags);
> -
> /* Update the pageblock-skip if the whole pageblock was scanned */
> if (blockpfn == end_pfn)
> update_pageblock_skip(cc, valid_page, total_isolated, false);
> @@ -1011,6 +1009,7 @@ static void isolate_freepages(struct com
> block_end_pfn = block_start_pfn,
> block_start_pfn -= pageblock_nr_pages,
> isolate_start_pfn = block_start_pfn) {
> + unsigned long isolated;
>
> /*
> * This can iterate a massively long zone without finding any
> @@ -1035,8 +1034,12 @@ static void isolate_freepages(struct com
> continue;
>
> /* Found a block suitable for isolating free pages from. */
> - isolate_freepages_block(cc, &isolate_start_pfn,
> - block_end_pfn, freelist, false);
> + isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> + block_end_pfn, freelist, false);
> + /* If isolation failed early, do not continue needlessly */
> + if (!isolated && isolate_start_pfn < block_end_pfn &&
> + cc->nr_migratepages > cc->nr_freepages)
> + break;

Hello, David.

Minchan found the bug on this patch.

isolate_freepages_block() could return positive number if it is
stopped due to memory shortage. In this case, above branch would not
catch this situation due to positive 'isolated' and go through the
following code.

if ((cc->nr_freepages >= cc->nr_migratepages) || XXX)
else
VM_BUG_ON(isolate_start_pfn < block_end_pfn);

In this case, cc->nr_freepages could be lower than cc->nr_migratepages
and isolate_start_pfn < block_end_pfn so it triggers VM_BUG_ON().

If my analysis is correct, please fix it.

Thanks.