Re: [PATCH 03/16] sched/fair: Disregard idle task wakee_flips in wake_wide

From: Mike Galbraith
Date: Mon May 23 2016 - 11:42:27 EST


On Mon, 2016-05-23 at 15:10 +0100, Morten Rasmussen wrote:
> On Mon, May 23, 2016 at 03:00:46PM +0200, Mike Galbraith wrote:
> > On Mon, 2016-05-23 at 13:00 +0100, Morten Rasmussen wrote:
> >
> > > The problem then seems to be distinguishing truly idle and busy doing
> > > interrupts. The issue that I observe is that wake_wide() likes pushing
> > > tasks around in lightly scenarios which isn't desirable for power
> > > management. Selecting the same cpu again may potentially let others
> > > reach deeper C-state.
> > >
> > > With that in mind I will if I can do better. Suggestions are welcome :-)
> >
> > None here. For big boxen that are highly idle, you'd likely want to
> > shut down nodes and consolidate load, but otoh, all that slows response
> > to burst, which I hate. I prefer race to idle, let power gating do its
> > job. If I had a server farm with enough capacity vs load variability
> > to worry about, I suspect I'd become highly interested in routing.
>
> I don't disagree for systems of that scale, but at the other end of the
> spectrum it is a single SoC we are trying squeeze the best possible
> mileage out of. That implies optimizing for power gating to reach deeper
> C-states when possible by consolidating idle-time and grouping
> idle cpus. Migrating task unnecessarily isn't helping us in achieving
> that, unfortunately :-(

Yup, the goals are pretty much mutually exclusive. For your goal, you
want more of an allocator like behavior, where stacking of tasks is bad
only once there's too much overlap (ie latency, defining is hard), and
allocation always has the same order (expand rightward or such for the
general case, adding little/big complexity for arm). For mine, current
behavior is good, avoid stacking like the plague.

-Mike