Re: [RFC 13/13] mm, compaction: fix and improve watermark handling

From: Mel Gorman
Date: Wed May 18 2016 - 09:50:22 EST


On Mon, May 16, 2016 at 11:25:05AM +0200, Michal Hocko wrote:
> On Tue 10-05-16 09:36:03, Vlastimil Babka wrote:
> > Compaction has been using watermark checks when deciding whether it was
> > successful, and whether compaction is at all suitable. There are few problems
> > with these checks.
> >
> > - __compact_finished() uses low watermark in a check that has to pass if
> > the direct compaction is to finish and allocation should succeed. This is
> > too pessimistic, as the allocation will typically use min watermark. It
> > may happen that during compaction, we drop below the low watermark (due to
> > parallel activity), but still form the target high-order page. By checking
> > against low watermark, we might needlessly continue compaction. After this
> > patch, the check uses direct compactor's alloc_flags to determine the
> > watermark, which is effectively the min watermark.
>
> OK, this makes some sense. It would be great if we could have at least
> some clarification why the low wmark has been used previously. Probably
> Mel can remember?
>

Two reasons -- it was a very rough estimate of whether enough pages are free
for compaction to have any chance. Secondly, it was to minimise the risk
that compaction would isolate so many pages that the zone was completely
depleted. This was a concern during the initial prototype of compaction.

> > - __compaction_suitable() then checks the low watermark plus a (2 << order) gap
> > to decide if there's enough free memory to perform compaction. This check
>
> And this was a real head scratcher when I started looking into the
> compaction recently. Why do we need to be above low watermark to even
> start compaction. Compaction uses additional memory only for a short
> period of time and then releases the already migrated pages.
>

Simply minimising the risk that compaction would deplete the entire
zone. Sure, it hands pages back shortly afterwards. At the time of the
initial prototype, page migration was severely broken and the system was
constantly crashing. The cautious checks were left in place after page
migration was fixed as there wasn't a compelling reason to remove them
at the time.

--
Mel Gorman
SUSE Labs