Re: [patch v2 4/4] mm, thp: do not perform sync compaction on pagefault

From: Vlastimil Babka
Date: Mon May 05 2014 - 10:48:35 EST


On 05/02/2014 10:29 PM, David Rientjes wrote:
On Fri, 2 May 2014, Mel Gorman wrote:

The page locks I'm referring to is the lock_page() in __unmap_and_move()
that gets called for sync compaction after the migrate_pages() iteration
makes a few passes and unsuccessfully grabs it. This becomes a forced
migration since __unmap_and_move() returns -EAGAIN when the trylock fails.


Can that be fixed then instead of disabling it entirely?


We could return -EAGAIN when the trylock_page() fails for
MIGRATE_SYNC_LIGHT. It would become a forced migration but we ignore that
currently for MIGRATE_ASYNC, and I could extend it to be ignored for
MIGRATE_SYNC_LIGHT as well.

We have perf profiles from one workload in particular that shows
contention on i_mmap_mutex (anon isn't interesting since the vast majority
of memory on this workload [120GB on a 128GB machine] is has a gup pin and
doesn't get isolated because of 119d6d59dcc0 ("mm, compaction: avoid
isolating pinned pages")) between cpus all doing memory compaction trying
to fault thp memory.


Abort SYNC_LIGHT compaction if the mutex is contended.


Yeah, I have patches for that as well but we're waiting to see if they are
actually needed when sync compaction is disabled for thp. If we aren't
actually going to disable it entirely, then I can revive those patches if
the contention becomes such an issue.

That's one example that we've seen, but the fact remains that at times
sync compaction will iterate the entire 128GB machine and not allow an
order-9 page to be allocated and there's nothing to preempt it like the
need_resched() or lock contention checks that async compaction has.

Make compact_control->sync the same enum field and check for contention
on the async/sync_light case but leave it for sync if compacting via the
proc interface?


Ok, that certainly can be done, I wasn't sure you would be happy with such
a change. I'm not sure there's so much of a difference between the new
compact_control->sync == MIGRATE_ASYNC and == MIGRATE_SYNC_LIGHT now,
though. Would it make sense to remove MIGRATE_SYNC_LIGHT entirely from
the page allocator, i.e. remove sync_migration entirely, and just retry
with a second call to compaction before failing instead?

Maybe we should step back and rethink the conditions for all sources of compaction to better balance effort vs desired outcome, so distinguish 4 modes (just quick thoughts):

1) kswapd - we don't want to slow it down too much, thus async, uses cached pfns, honors skip bits, but does not give up on order=X blocks of pages just because some page was unable to be isolated/migrated (so gradually it might clean up such block over time).

2) hugepaged - it can afford to wait, so async + sync, use cached pfns, ignore skip bits on sync, also do not give up on order=THP blocks (?)

3) direct compaction on allocation of order X- we want to avoid allocation latencies, so it should skip over order=X blocks of pages as soon as it cannot isolate some page in such block (and put back already-isolated pages instead of migrating them). As for other parameters I'm not sure. Sync should still be possible thanks to the deferred_compaction logic? Maybe somehow transform the yes/no answer of deferred compaction to a number of how many pageblocks it should try before giving up, so there are no latency spikes limited only by the size of the zone?

4) compaction from proc interface - sync, reset cached pfn's, ignore deferring, ignore skip bits...

Vlastimil


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/