On Wed, Sep 9, 2015 at 2:39 PM, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
On Tue, Sep 08, 2015 at 05:26:13PM +0900, Joonsoo Kim wrote:
2015-08-24 21:30 GMT+09:00 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>:
The primary purpose of watermarks is to ensure that reclaim can always
make forward progress in PF_MEMALLOC context (kswapd and direct reclaim).
These assume that order-0 allocations are all that is necessary for
forward progress.
High-order watermarks serve a different purpose. Kswapd had no high-order
awareness before they were introduced (https://lkml.org/lkml/2004/9/5/9).
This was particularly important when there were high-order atomic requests.
The watermarks both gave kswapd awareness and made a reserve for those
atomic requests.
There are two important side-effects of this. The most important is that
a non-atomic high-order request can fail even though free pages are available
and the order-0 watermarks are ok. The second is that high-order watermark
checks are expensive as the free list counts up to the requested order must
be examined.
With the introduction of MIGRATE_HIGHATOMIC it is no longer necessary to
have high-order watermarks. Kswapd and compaction still need high-order
awareness which is handled by checking that at least one suitable high-order
page is free.
I still don't think that this one suitable high-order page is enough.
If fragmentation happens, there would be no order-2 freepage. If kswapd
prepares only 1 order-2 freepage, one of two successive process forks
(AFAIK, fork in x86 and ARM require order 2 page) must go to direct reclaim
to make order-2 freepage. Kswapd cannot make order-2 freepage in that
short time. It causes latency to many high-order freepage requestor
in fragmented situation.
So what do you suggest instead? A fixed number, some other heuristic?
You have pushed several times now for the series to focus on the latency
of standard high-order allocations but again I will say that it is outside
the scope of this series. If you want to take steps to reduce the latency
of ordinary high-order allocation requests that can sleep then it should
be a separate series.
I do believe https://lkml.org/lkml/2015/9/9/313 does a better job
here. I have to admit the patch header is a bit misleading here since
we don't actually exclude CMA pages, we just _fix_ the calculation in
the loop which is utterly wrong otherwise.
~vitaly