Re: Regression in mobility grouping?

From: Johannes Weiner
Date: Wed Sep 28 2016 - 11:39:45 EST

Next message: SF Markus Elfring: "[PATCH 03/10] md/dm-crypt: Rename a jump label in crypt_message()"
Previous message: Greg KH: "Re: [PATCH 1/1] Staging: android: ion: Fixed coding style issues"
In reply to: Vlastimil Babka: "Re: Regression in mobility grouping?"
Next in thread: Johannes Weiner: "Re: Regression in mobility grouping?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Vlastimil,

On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> On 09/28/2016 03:41 AM, Johannes Weiner wrote:
> > Hi guys,
> >
> > we noticed what looks like a regression in page mobility grouping
> > during an upgrade from 3.10 to 4.0. Identical machines, workloads, and
> > uptime, but /proc/pagetypeinfo on 3.10 looks like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
> > Node 1, zone Normal 815 433 31518 2 0
> >
> > and on 4.0 like this:
> >
> > Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
> > Node 1, zone Normal 3880 3530 25356 2 0 0
>
> It's worth to keep in mind that this doesn't reflect where the actual
> unmovable pages reside. It might be that in 3.10 they are spread within
> the movable pages. IIRC enabling page_owner (not sure if in 4.0, there
> were some later fixes I think) can augment pagetypeinfo with at least
> some statistics of polluted pageblocks.

Thanks, I'll look at the mixed block counts. I failed to make clear,
we saw that issue in the switch from 3.10 to 4.0, and I mentioned
those two kernels as last known good / first known bad. But later
kernels - we tried with 4.6 - look the same. This appears to be a
regression in (higher-order) allocation service quality somewhere
after 3.10 that persists into current kernels.

> Does e.g. /proc/meminfo suggest how much unmovable/reclaimable memory
> there should be allocated and if it would fill the respective
> pageblocks, or if they are poorly utilized?

They are very poorly utilized. On a machine with 90% anon/cache pages
alone we saw 50% of the page blocks unmovable.

> > 4.0 is either polluting pageblocks more aggressively at allocation, or
> > is not able to make pageblocks movable again when the reclaimable and
> > unmovable allocations are released. Invoking compaction manually
> > (/proc/sys/vm/compact_memory) is not bringing them back, either.
> >
> > The problem we are debugging is that these machines have a very high
> > rate of order-3 allocations (fdtable during fork, network rx), and
> > after the upgrade allocstalls have increased dramatically. I'm not
> > entirely sure this is the same issue, since even order-0 allocations
> > are struggling, but the mobility grouping in itself looks problematic.
> >
> > I'm still going through the changes relevant to mobility grouping in
> > that timeframe, but if this rings a bell for anyone, it would help. I
> > hate blaming random patches, but these caught my eye:
> >
> > 9c0415e mm: more aggressive page stealing for UNMOVABLE allocations
> > 3a1086f mm: always steal split buddies in fallback allocations
> > 99592d5 mm: when stealing freepages, also take pages created by splitting buddy page
>
> Check also the changelogs for mentions of earlier commits, e.g. 99592d5
> should be restoring behavior that changed in 3.12-3.13 and you are
> upgrading from 3.10.

Good point.

> > The changelog states that by aggressively stealing split buddy pages
> > during a fallback allocation we avoid subsequent stealing. But since
> > there are generally more movable/reclaimable pages available, and so
> > less falling back and stealing freepages on behalf of movable, won't
> > this mean that we could expect exactly that result - growing numbers
> > of unmovable blocks, while rarely stealing them back in movable alloc
> > fallbacks? And the expansion of !MOVABLE blocks would over time make
> > compaction less and less effective too, seeing as it doesn't consider
> > anything !MOVABLE suitable migration targets?
>
> Yeah this is an issue with compaction that was brought up recently and I
> want to tackle next.

Agreed, it would be nice if compaction could reclaim unmovable and
reclaimable blocks whose polluting allocations have since been freed.

But there is a limit to how lazy mobility grouping can be and still
expect compaction to fix it up. If 50% of the page blocks are marked
unmovable, we don't pack incoming polluting allocations. When spread
out the right way, even just a few of those can have a devastating
impact on overall compactability.

So regardless of future compaction improvements, we need to get
anti-frag accuracy in the allocator closer to 3.10 levels again.

> > Attached are the full /proc/pagetypeinfo and /proc/buddyinfo from both
> > kernels on machines with similar uptimes and directly after invoking
> > compaction. As you can see, the buddy lists are much more fragmented
> > on 4.0, with unmovable/reclaimable allocations polluting more blocks.
> >
> > Any thoughts on this would be greatly appreciated. I can test patches.
>
> I guess testing revert of 9c0415e could give us some idea. Commit
> 3a1086f shouldn't result in pageblock marking differences and as I said
> above, 99592d5 should be just restoring to what 3.10 did.

I can give this a shot, but note that this commit makes only unmovable
stealing more aggressive. We see reclaimable blocks up as well.

The workload is fairly variable, so it'll take about a day to smooth
out a meaningful average.

Thanks for your insights, Vlastimil!

Next message: SF Markus Elfring: "[PATCH 03/10] md/dm-crypt: Rename a jump label in crypt_message()"
Previous message: Greg KH: "Re: [PATCH 1/1] Staging: android: ion: Fixed coding style issues"
In reply to: Vlastimil Babka: "Re: Regression in mobility grouping?"
Next in thread: Johannes Weiner: "Re: Regression in mobility grouping?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]