Re: [PATCH 06/16] sched: Disable WAKE_AFFINE for asymmetric configurations

From: Morten Rasmussen
Date: Wed May 25 2016 - 05:11:27 EST


On Tue, May 24, 2016 at 05:53:27PM +0200, Vincent Guittot wrote:
> On 24 May 2016 at 17:02, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> > On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:
> >> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> >> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:
> >> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> >> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:
> >> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> >> >> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:
> >> >> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote:
> >> >> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)
> >> >> >> >> > let affine wakeups be constrained to cpus of the same type.
> >> >> >> >>
> >> >> >> >> Can you explain why you don't want wake affine with cpus with
> >> >> >> >> different compute capacity ?
> >> >> >> >
> >> >> >> > I should have made the overall idea a bit more clear. The idea is to
> >> >> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}
> >> >> >> > path so we don't have to touch select_idle_sibling().
> >> >> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed
> >> >> >> > that people wouldn't like adding extra overhead in there to deal with
> >> >> >> > capacity and utilization.
> >> >> >>
> >> >> >> So this means that we will never use the quick path of
> >> >> >> select_idle_sibling for cross capacity migration but always the one
> >> >> >> with extra overhead?
> >> >> >
> >> >> > Yes. select_idle_sibling() is only used to choose among equal capacity
> >> >> > cpus (capacity_orig).
> >> >> >
> >> >> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also
> >> >> >> be used for cross capacity migration ? so we can use wake_affine if
> >> >> >> the task or the cpus (even with different capacity) doesn't need this
> >> >> >> extra overhead
> >> >> >
> >> >> > The test in patch 9 is to determine whether we are happy with the
> >> >> > capacity of the previous cpu, or we should go look for one with more
> >> >> > capacity. I don't see how we can use select_idle_sibling() unmodified
> >> >> > for sched domains containing cpus of different capacity to select an
> >> >> > appropriate cpu. It is just picking an idle cpu, it might have high
> >> >> > capacity or low, it wouldn't care.
> >> >> >
> >> >> > How would you avoid the overhead of checking capacity and utilization of
> >> >> > the cpus and still pick an appropriate cpu?
> >> >>
> >> >> My point is that there is some wake up case where we don't care about
> >> >> the capacity and utilization of cpus even for cross capacity migration
> >> >> and we will never take benefit of this fast path.
> >> >> You have added an extra check for setting want_affine in patch 9 which
> >> >> uses capacity and utilization of cpu to disable this fast path when a
> >> >> task needs more capacity than available. Can't you use this function
> >> >> to disable the want_affine for cross-capacity migration situation that
> >> >> cares of the capacity and need the full scan of sched_domain but keep
> >> >> it enable for other cases ?
> >> >
> >> > It is not clear to me what the other cases are. What kind of cases do
> >> > you have in mind?
> >>
> >> As an example, you have a task A that have to be on a big CPU because
> >> of the requirement of compute capacity, that wakes up a task B that
> >> can run on any cpu according to its utilization. The fast wake up path
> >> is fine for task B whatever prev cpu is.
> >
> > In that case, we will take always take fast path (select_idle_sibling())
> > for task B if wake_wide() allows it, which should be fine.
>
> Even if want_affine is set, the wake up of task B will not use the fast path.
> The affine_sd will not be set because the sched_domain, which have
> both cpus, will not have the SD_WAKE_AFFINE flag according to this
> patch, isn't it ?
> So task B can't use the fast path whereas nothing prevent him to take
> benefit of it
>
> Am I missing something ?

No, I think you are right. Very good point. The cpumask test with
sched_domain_span() will of cause return false. So yes, in this case the
slow path is taken. It isn't wrong as such, just slower for asymmetric
capacity systems :-)

It is clearly not as optimized for asymmetric capacity systems as it
could be, but my focus was to not ruin existing behaviour and minimize
overhead for others. There are a lot of different routes through those
conditions in the first half of select_task_rq_fair() that aren't
obvious. I worry that some users depend on them and that I don't
see/understand all of them.

If people agree on changing things, it is fine with me. I just tried to
avoid getting the patches shot down on that account ;-)