Re: [PATCH v2 05/13] sched: Enable SD_BALANCE_WAKE for asymmetric capacity systems

From: Morten Rasmussen
Date: Mon Jul 11 2016 - 06:35:50 EST

On Mon, Jul 11, 2016 at 12:04:49PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 06:03:16PM +0100, Morten Rasmussen wrote:
> > Systems with the SD_ASYM_CPUCAPACITY flag set indicate that sched_groups
> > at this level or below do not include cpus of all capacities available
> > (e.g. group containing little-only or big-only cpus in big.LITTLE
> > systems). It is therefore necessary to put in more effort in finding an
> > appropriate cpu at task wake-up by enabling balancing at wake-up
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6397,6 +6397,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
> > * Convert topological properties into behaviour.
> > */
> >
> > + if (sd->flags & SD_ASYM_CPUCAPACITY)
> > + sd->flags |= SD_BALANCE_WAKE;
> > +
> So I'm a bit confused on the exact requirements for this; as also per
> the previous patch.
> Should all sched domains get BALANCE_WAKE if one (typically the top)
> domain has ASYM_CAP set?
> The previous patch set it on the actual asym one and one below that, but
> what if there's more levels below that? Imagine ARM gaining SMT or
> somesuch. Should not then that level also get BALANCE_WAKE in order to
> 'correctly' place light/heavy tasks?
> IOW, are you trying to fudge the behaviour semantics by creating 'weird'
> ASYM_CAP rules instead of having a more complex behaviour rule here?

That is one possible way of describing it :-)

The proposed semantic is to set ASYM_CAP at all levels starting from the
bottom up until you have sched_groups containing all types of cpus
available in the system, or reach the top level.

The fundamental reason for this weird semantics is that we somehow need
to know at the lower levels, which may be capacity symmetric, if we need
to consider balancing at a higher level to see the asymmetry or not.

If the flag isn't set bottom up we need some other way of knowing if the
system is asymmetric, or we would have to go look for the flag further
up the sched_domain hierarchy each time.

I'm not saying this is the perfect solution, I'm happy to discuss

The example in the previous patch has the flag set on both levels, as we
have two clusters of different cpus and therefore have to go to the top
so 'see' all the types of cpus we have in the system.

If you add SMT, you would add a third level at the bottom with
ASYM_CAP set as well as you still have to balance at top level to have
the full range of choice of cpu type.

Should someone build a system with multiple big.LITTLE cluster pairs and
essentially add another sched_domain level on top, then that level
should _not_ have the ASYM_CAP flag set. The sched_groups at this level
would span both big and little cpus of the cluster pair so there is
little reason to expand the search scope at wake-up further.

I hope that makes sense.