Re: [PATCH v2] mm: scale kswapd watermarks in proportion to memory

From: Johannes Weiner
Date: Thu Feb 25 2016 - 15:07:42 EST


Hi Joonsoo,

On Thu, Feb 25, 2016 at 09:37:44AM +0900, Joonsoo Kim wrote:
> On Mon, Feb 22, 2016 at 03:33:22PM -0800, Johannes Weiner wrote:
> > In machines with 140G of memory and enterprise flash storage, we have
> > seen read and write bursts routinely exceed the kswapd watermarks and
> > cause thundering herds in direct reclaim. Unfortunately, the only way
> > to tune kswapd aggressiveness is through adjusting min_free_kbytes -
> > the system's emergency reserves - which is entirely unrelated to the
> > system's latency requirements. In order to get kswapd to maintain a
> > 250M buffer of free memory, the emergency reserves need to be set to
> > 1G. That is a lot of memory wasted for no good reason.
> >
> > On the other hand, it's reasonable to assume that allocation bursts
> > and overall allocation concurrency scale with memory capacity, so it
> > makes sense to make kswapd aggressiveness a function of that as well.
> >
> > Change the kswapd watermark scale factor from the currently fixed 25%
> > of the tunable emergency reserve to a tunable 0.001% of memory.
>
> s/0.001%/0.1%

Of course, you are right. Thanks for pointing it out.

Andrew, I'm attaching a drop-in replacement for what you have, since
it includes fixing the changelog. But it might be easier to edit the
patch for these two instances in place.

> > @@ -803,6 +803,24 @@ performance impact. Reclaim code needs to take various locks to find freeable
> > directory and inode objects. With vfs_cache_pressure=1000, it will look for
> > ten times more freeable objects than there are.
> >
> > +=============================================================
> > +
> > +watermark_scale_factor:
> > +
> > +This factor controls the aggressiveness of kswapd. It defines the
> > +amount of memory left in a node/system before kswapd is woken up and
> > +how much memory needs to be free before kswapd goes back to sleep.
> > +
> > +The unit is in fractions of 10,000. The default value of 10 means the
> > +distances between watermarks are 0.001% of the available memory in the
> > +node/system. The maximum value is 1000, or 10% of memory.
>
> Ditto for 0.001%.