Re: Regression in mobility grouping?

From: Joonsoo Kim
Date: Thu Sep 29 2016 - 02:06:33 EST


On Wed, Sep 28, 2016 at 10:25:40PM -0400, Johannes Weiner wrote:
> On Wed, Sep 28, 2016 at 11:39:25AM -0400, Johannes Weiner wrote:
> > On Wed, Sep 28, 2016 at 11:00:15AM +0200, Vlastimil Babka wrote:
> > > I guess testing revert of 9c0415e could give us some idea. Commit
> > > 3a1086f shouldn't result in pageblock marking differences and as I said
> > > above, 99592d5 should be just restoring to what 3.10 did.
> >
> > I can give this a shot, but note that this commit makes only unmovable
> > stealing more aggressive. We see reclaimable blocks up as well.
>
> Quick update, I reverted back to stealing eagerly only on behalf of
> MIGRATE_RECLAIMABLE allocations in a 4.6 kernel:

Hello, Johannes.

I think that it would be better to check 3.10 with above patches.
Fragmentation depends on not only policy itself but also
allocation/free pattern. There might be a large probability that
allocation/free pattern is changed in this large kernel version
difference.

>
> static bool can_steal_fallback(unsigned int order, int start_mt)
> {
> if (order >= pageblock_order / 2 ||
> start_mt == MIGRATE_RECLAIMABLE ||
> page_group_by_mobility_disabled)
> return true;
>
> return false;
> }
>
> Yet, I still see UNMOVABLE growing to the thousands within minutes,
> whereas 3.10 didn't reach those numbers even after days of uptime.
>
> Okay, that wasn't it. However, there is something fishy going on,
> because I see extfrag traces like these:
>
> <idle>-0 [006] d.s. 1110.217281: mm_page_alloc_extfrag: page=ffffea0064142000 pfn=26235008 alloc_order=3 fallback_order=3 pageblock_order=9 alloc_migratetype=0 fallback_migratetype=2 fragmenting=1 change_ownership=1
>
> enum {
> MIGRATE_UNMOVABLE,
> MIGRATE_MOVABLE,
> MIGRATE_RECLAIMABLE,
> MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
> MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
> ...
> };
>
> This is an UNMOVABLE order-3 allocation falling back to RECLAIMABLE.
> According to can_steal_fallback(), this allocation shouldn't steal the
> pageblock, yet change_ownership=1 indicates the block is UNMOVABLE.
>
> Who converted it? I wonder if there is a bug in ownership management,
> and there was an UNMOVABLE block on the RECLAIMABLE freelist from the
> beginning. AFAICS we never validate list/mt consistency anywhere.

According to my code review, it would be possible. When stealing
happens, we moved those buddy pages to current requested migratetype
buddy list. If the other migratetype allocation request comes and
stealing from the buddy list of previous requested migratetype
happens, change_ownership will show '1' even if there is no ownership
changing.

Thanks.