Re: [PATCH v2] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2

From: Peter Zijlstra
Date: Wed Feb 10 2021 - 06:30:19 EST


On Tue, Feb 09, 2021 at 08:58:15PM +0000, Song Bao Hua (Barry Song) wrote:

> > I've finally had a moment to think about this, would it make sense to
> > also break up group: node0+1, such that we then end up with 3 groups of
> > equal size?
>

> Since the sched_domain[n-1] of a part of node[m]'s siblings are able
> to cover the whole span of sched_domain[n] of node[m], there is no
> necessity to scan over all siblings of node[m], once sched_domain[n]
> of node[m] has been covered, we can stop making more sched_groups. So
> the number of sched_groups is small.
>
> So historically, the code has never tried to make sched_groups result
> in equal size. And it permits the overlapping of local group and remote
> groups.

Histrorically groups have (typically) always been the same size though.

The reason I did ask is because when you get one large and a bunch of
smaller groups, the load-balancing 'pull' is relatively smaller to the
large groups.

That is, IIRC should_we_balance() ensures only 1 CPU out of the group
continues the load-balancing pass. So if, for example, we have one group
of 4 CPUs and one group of 2 CPUs, then the group of 2 CPUs will pull
1/2 times, while the group of 4 CPUs will pull 1/4 times.

By making sure all groups are of the same level, and thus of equal size,
this doesn't happen.