Re: [RFC 3/4] mm, thp: try fault allocations only if we expect them to succeed

From: David Rientjes
Date: Wed Jun 17 2015 - 21:20:32 EST


On Mon, 11 May 2015, Vlastimil Babka wrote:

> Since we track THP availability for khugepaged THP collapses, we can use it
> also for page fault THP allocations. If khugepaged with its sync compaction
> is not able to allocate a hugepage, then it's unlikely that the less involved
> attempt on page fault would succeed, and the cost could be higher than THP
> benefits. Also clear the THP availability flag if we do attempt and fail to
> allocate during page fault, and set the flag if we are freeing a large enough
> page from any context. The latter doesn't include merges, as that's a fast
> path and unlikely to make much difference.
>

That depends on how long {scan,alloc}_sleep_millisecs are, so if
khugepaged fails to allocate a hugepage on all nodes, it sleeps for
alloc_sleep_millisecs (default 60s), and then there's immediate memory
freeing, thp page faults don't happen again for 60s. That's scary to me
when thp_avail_nodes is clear, a large process terminates, and then
immediately starts back up. None of its memory is faulted as thp and
depending on how large it is, khugepaged may fail to allocate hugepages
when it wakes back up so it never scans (the only reason why
thp_avail_nodes was clear before it terminated originally).

I'm not sure that approach can work unless the inference of whether a
hugepage can be allocated at a given time is a very good indicator of
whether a hugepage can be allocated alloc_sleep_millisecs later, and I'm
afraid that's not the case.

I'm very happy that you're looking at thp fault latency and the role that
khugepaged can play in accepting responsibility for defragmentation,
though. It's an area that has caused me some trouble lately and I'd like
to be able to improve.

We see an immediate benefit when experimenting with doing synchronous
memory compactions of all memory every 15s. That's done using a cronjob
rather than khugepaged, but the idea is the same.

What would your thoughts be about doing something radical like

- having khugepaged do synchronous memory compaction of all memory at
regulary intervals,

- track how many pageblocks are free for thp memory to be allocated,

- terminate collapsing if free pageblocks are below a threshold,

- trigger a khugepaged wakeup at page fault when that number of
pageblocks falls below a threshold,

- determine the next full sync memory compaction based on how many
pageblocks were defragmented on the last wakeup, and

- avoid memory compaction for all thp page faults.

(I'd ignore what is actually the responsibility of khugepaged and what is
done in task work at this time.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/