RE: [Linuxarm] Re: [PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

From: Song Bao Hua (Barry Song)
Date: Thu Feb 18 2021 - 17:09:36 EST




> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider@xxxxxxx]
> Sent: Friday, February 19, 2021 1:41 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; Peter Zijlstra
> <peterz@xxxxxxxxxxxxx>
> Cc: vincent.guittot@xxxxxxxxxx; mgorman@xxxxxxx; mingo@xxxxxxxxxx;
> dietmar.eggemann@xxxxxxx; morten.rasmussen@xxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx; xuwei (O)
> <xuwei5@xxxxxxxxxx>; Liguozhu (Kenneth) <liguozhu@xxxxxxxxxxxxx>; tiantao (H)
> <tiantao6@xxxxxxxxxxxxx>; wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Zengtao (B)
> <prime.zeng@xxxxxxxxxxxxx>; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>;
> guodong.xu@xxxxxxxxxx; Meelis Roos <mroos@xxxxxxxx>
> Subject: [Linuxarm] Re: [PATCH v2] sched/topology: fix the issue groups don't
> span domain->span for NUMA diameter > 2
>
>
> Hi Barry,
>
> On 18/02/21 09:17, Song Bao Hua (Barry Song) wrote:
> > Hi Valentin,
> >
> > I understand Peter's concern is that the local group has different
> > size with remote groups. Is this patch resolving Peter's concern?
> > To me, it seems not :-)
> >
>
> If you remove the '&& i != cpu' in build_overlap_sched_groups() you get that,
> but then you also get some extra warnings :-)
>
> Now yes, should_we_balance() only matters for the local group. However I'm
> somewhat wary of messing with the local groups; for one it means you would have
> more than one tl now accessing the same sgc->next_update, sgc->{min,
> max}capacity, sgc->group_imbalance (as Vincent had pointed out).
>
> By ensuring only remote (i.e. !local) groups are modified (which is what your
> patch does), we absolve ourselves of this issue, which is why I prefer this
> approach ATM.

Yep. The grandchild approach seems still to the feasible way for this moment.

>
> > Though I don’t understand why different group sizes will be harmful
> > since all groups are calculating avg_load and group_type based on
> > their own capacities. Thus, for a smaller group, its capacity would be
> > smaller.
> >
> > Is it because a bigger group has relatively less chance to pull, so
> > load balancing will be completed more slowly while small groups have
> > high load?
> >
>
> Peter's point is that, if at a given tl you have groups that look like
>
> g0: 0-4, g1: 5-6, g2: 7-8
>
> Then g0 is half as likely to pull tasks with load_balance() than g1 or g2 (due
> to the group size vs should_we_balance())

Yep. the difference is that g1 and g2 won't be local groups of any CPU in
this tl.
The smaller groups g1 and g2 are only remote groups, so should_we_balance()
doesn't matter here for them.

>
>
> However, I suppose one "trick" to be aware of here is that since your patch
> *doesn't* change the local group, we do have e.g. on CPU0:
>
> [ 0.374840] domain-2: span=0-5 level=NUMA
> [ 0.375054] groups: 0:{ span=0-3 cap=4003 }, 4:{ span=4-5 cap=1988 }
>
> *but* on CPU4 we get:
>
> [ 0.387019] domain-2: span=0-1,4-7 level=NUMA
> [ 0.387211] groups: 4:{ span=4-7 cap=3984 }, 0:{ span=0-1 cap=2013 }
>
> IOW, at a given tl, all *local* groups have /roughly/ the same size and thus
> similar pull probability (it took me writing this mail to see it that way).
> So perhaps this is all fine already?

Yep. since should_we_balance() only matters for local groups and we haven't
changed the size of local groups in the grandchild approach, all local groups
are still getting similar pull probability in this topology level.

Since we still prefer the grandchild approach ATM, if Peter has no more concern
on the unequal size between local groups and remote groups, I would be glad
to send v4 of grandchild approach by rewriting changelog to explain the update
issue of sgc->next_update, sgc->{min, max}capacity, sgc->group_imbalance
Vincent pointed out and also describe the local_groups are not touched, thus
still in the equal size.

Thanks
Barry