Re: [PATCH 1/1] mm: thp: Redefine default THP defrag behaviour disable it by default

From: Mel Gorman
Date: Fri Feb 26 2016 - 06:13:25 EST

On Fri, Feb 26, 2016 at 12:02:19AM +0100, Andrea Arcangeli wrote:
> On Thu, Feb 25, 2016 at 07:56:13PM +0000, Mel Gorman wrote:
> > Which is a specialised case that does not apply to all users. Remember
> > that the data showed that a basic streaming write of an anon mapping on
> > a freshly booted NUMA system was enough to stall the process for long
> > periods of time.
> >
> > Even in the specialised case, a single VM reaching its peak performance
> > may rely on getting THP but if that's at the cost of reclaiming other
> > pages that may be hot to a second VM then it's an overall loss.
> You're mixing the concern of that THP will use more memory with the
> cost of defragmentation.

There are three cases

1. THP was allocated when the application only required 4K and consumes
more memory. This has always been the case but not the concern here
2. Memory is fragmented but there are enough free pages. In this case,
only compaction is required and the memory footprint is the same
3. Memory is fragmentation and pages have to be freed before compaction.

It's 3 I was referred to even though all the cases are important.

> If you've memory issues and you are ok to
> sacrifice performance for swapping less you should disable THP, set it
> to never, and that's it.

I want to get to the half-way point where THP is used if easily available
without worrying that there will be stalls at some point in the future
or requiring application modification for madvise. That's better than the
all or nothing approach that users are currently faced with. I wince every
time I see a tuning guide suggesting THP be disabled and have handled too
many bugs where disabling THP was a workaround.

That said, you made a number of important points. I'm not going to respond
to them individually because I believe I understand your concerns and now
agree with them. I've prototyped a patch that modifies the defrag tunable
as follows;

1. By default, "madvise" and direct reclaim/compaction for applications
that specifically requested that behaviour. This will avoid breaking
MADV_HUGEPAGE which you mentioned in a few places
2. "never" will never reclaim anything and was the default behaviour of
version 1 but will not be the default in version 2.
3. "defer" will wake kswapd which will reclaim or wake kcompactd
whichever is necessary. This is new but avoids stalls while helping
khugepaged do its work quickly in the near future.
4. "always" will direct reclaim/compact just like todays behaviour

I'm testing it at the moment to make sure each of the options behave

Mel Gorman