Re: [PATCH v5 14/14] mm, compaction: try to capture the just-created high-order freepage

From: Joonsoo Kim
Date: Wed Jul 30 2014 - 10:19:20 EST


Oops... resend because of omitting everyone on CC.

2014-07-30 18:56 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>:
> On 07/30/2014 10:39 AM, Joonsoo Kim wrote:
>>
>> On Tue, Jul 29, 2014 at 05:34:37PM +0200, Vlastimil Babka wrote:
>>>
>>> Could do it in isolate_migratepages() for whole pageblocks only (as
>>> David's patch did), but that restricts the usefulness. Or maybe do
>>> it fine grained by calling isolate_migratepages_block() multiple
>>> times. But the overhead of multiple calls would probably suck even
>>> more for lower-order compactions. For CMA the added overhead is
>>> basically only checks for next_capture_pfn that will be always
>>> false, so predictable. And mostly just in branches where isolation
>>> is failing, which is not the CMA's "fast path" I guess?
>>
>>
>> You can do it find grained with compact_control's migratepages list
>> or new private list. If some pages are isolated and added to this list,
>> you can check pfn of page on this list and determine appropriate capture
>> candidate page. This approach can give us more flexibility for
>> choosing capture candidate without adding more complexity to
>> common function. For example, you can choose capture candidate if
>> there are XX isolated pages in certain range.
>
>
> Hm I see. But the logic added by page capture was also a prerequisity for
> the "[RFC PATCH V4 15/15] mm, compaction: do not migrate pages when that
> cannot satisfy page fault allocation"
> http://marc.info/?l=linux-mm&m=140551859423716&w=2
>
> And that could be hardly done by a post-isolation inspection of the
> migratepages list. And I haven't given up on that idea yet :)

Okay. I didn't look at that patch yet. I will try later :)

>
>>>> In __isolate_free_page(), we check zone_watermark_ok() with order 0.
>>>> But normal allocation logic would check zone_watermark_ok() with
>>>> requested
>>>> order. Your capture logic uses __isolate_free_page() and it would
>>>> affect compaction success rate significantly. And it means that
>>>> capture logic allocates high order page on page allocator
>>>> too aggressively compared to other component such as normal high order
>>>
>>>
>>> It's either that, or the extra lru drain that makes the different.
>>> But the "aggressiveness" would in fact mean better accuracy.
>>> Watermark checking may be inaccurate. Especially when memory is
>>> close to the watermark and there is only a single high-order page
>>> that would satisfy the allocation.
>>
>>
>> If this "aggressiveness" means better accuracy, fixing general
>> function, watermark_ok() is better than adding capture logic.
>
>
> That's if fixing the function wouldn't add significant overhead to all the
> callers. And making it non-racy and not prone to per-cpu counter drifts
> would certainly do that :(
>
>
>> But, I guess that there is a reason that watermark_ok() is so
>> conservative. If page allocator aggressively provides high order page,
>> future atomic high order page request cannot succeed easily. For
>> preventing this situation, watermark_ok() should be conservative.
>
>
> I don't think it's intentionally conservative, just unreliable. It tests two
> things together:
>
> 1) are there enough free pages for the allocation wrt watermarks?
> 2) does it look like that there is a free page of the requested order?

I don't think that watermark_ok()'s intention is checking if there is a free
page of the requested order. If we want to know it, we could use more
easy way something like below.

X = number of total freepage - number of freepage lower than requested order
If X is positive, we can conclude that there is at least one freepage
of requested order and this equation is easy to compute.

But, watermark_ok() doesn't do that. Instead, it uses mark value to determine
if we can go further. I guess that this means that allocation/reclaim logic want
to preserve certain level of high order freepages according to system memory
size, although I don't know what the reason is exactly. So
the "aggressiveness" on capture logic here could break what
allocation/reclaim want.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/