On Tue, 29 Jul 2014, Vlastimil Babka wrote:
I think there's two ways to go about it:
- allow a single thp fault to be expensive and then rely on deferred
compaction to avoid subsequent calls in the near future, or
- try to make all thp faults be as least expensive as possible so that
the cumulative effect of faulting large amounts of memory doesn't end
up with lengthy stalls.
Both of these are complex because of the potential for concurrent calls to
memory compaction when faulting thp on several cpus.
I also think the second point from that email still applies, that we
should abort isolating pages within a pageblock for migration once it can
no longer allow a cc->order allocation to succeed.
That was the RFC patch 15, I hope to reintroduce it soon.
Which of the points above are you planning on addressing in another patch?
I think the approach would cause the above to be mutually exclusive
options.
You could still test
it meanwhile to see if you see the same extfrag regression as me. In my tests,
kswapd/khugepaged wasn't doing enough work to defragment the pageblocks that
the stress-highalloc benchmark (configured to behave like thp page fault) was
skipping.
The initial regression that I encountered was on a 128GB machine where
async compaction would cause faulting 64MB of transparent hugepages to
excessively stall and I don't see how kswapd can address this if there's
no memory pressure and khugepaged can address it if it has the default
settings which is very slow.
Another idea I had is to only do async memory compaction for thp on local
zones and avoid defragmenting remotely since, in my experimentation,
remote thp memory causes a performance degradation over regular pages. If
that solution were to involve zone_reclaim_mode and a test of
node_distance() > RECLAIM_DISTANCE, I think that would be acceptable as
well.