Re: [RFC][PATCH 2/2] sched: Enqueue tasks on a cpu with only SCHED_IDLE tasks

From: Viresh Kumar
Date: Tue Nov 27 2018 - 05:24:48 EST


Hi Quentin,

On 26-11-18, 12:37, Quentin Perret wrote:
> On Monday 26 Nov 2018 at 16:50:24 (+0530), Viresh Kumar wrote:
> > The scheduler tries to schedule a newly wakeup task on an idle CPU to
> > make sure the new task gets chance to run as soon as possible, for
> > performance reasons.
> >
> > The SCHED_IDLE scheduling policy is used for tasks which have the lowest
> > priority and there is no hurry in running them. If all the tasks
> > currently enqueued on a CPU have their policy set to SCHED_IDLE, then
> > any new task (non SCHED_IDLE) enqueued on that CPU should normally get a
> > chance to run immediately. This patch takes advantage of this to save
> > power in some cases by avoiding waking up an idle CPU (which may be in
> > some deep idle state) and enqueuing the new task on a CPU which only has
> > SCHED_IDLE tasks.
>
> So, avoiding to wake-up a CPU isn't always good for energy. You may
> prefer to spread tasks in order to keep the OPP low, for example. What
> you're trying to achieve here can be actively harmful for both energy
> and performance in some cases, I think.

Yeah, we may end up packing SCHED_IDLE tasks to a single CPU in this case.

We know that dynamic energy is significantly more than static energy and that is
what we should care more about. Yes, higher OPP should be avoided (apart from
performance reasons), but isn't it better (for power) to run a single CPU at
somewhat higher OPP (1GHz ?) instead of running four of them at lower OPPs (500
MHz) ?

> Also, packing will reduce your chances to go cluster idle (yes you're
> not guaranteed to go cluster idle either if you spread depending how
> the tasks align in time, but at least there's a chance). So, even from
> the idle perspective it's not obvious we actually want to do that.

But do we really want to fire all CPUs of a cluster to finish the work earlier
and go cluster idle ? We don't really believe in race-to-idle as that's why we
have the whole DVFS thing here, right ?

> And finally, the placement that this patch tries to achieve is
> inherently unbalanced IIUC. So, unless you hide this behind the EAS
> static key, you'll need to make sure the periodic/idle load balance code
> doesn't kill all the work you do in the wake-up path. So I'm not sure
> this patch really works in practice in its current state.

True, I intentionally left the load-balancer code as is to avoid larger diff for
now. The idea was to get more feedback on the whole thing before investing too
much on it.

> Now, I think you have a point by saying we could possibly be a bit
> smarter with the way we deal with SCHED_IDLE tasks, especially if they
> are going to be used more (is that a certainty BTW ?), I'm just not
> entirely convinced with the 'power' argument yet.

Todd confirmed earlier (privately) that most (?) of the android background tasks
can actually be migrated to use SCHED_IDLE stuff as there is no urgency in
scheduling them normally.

@Todd, can you please provide some inputs here as well ?

> Maybe there is something we could do if, say we need to schedule a
> SCHED_NORMAL task and all CPUs have roughly the same load and/or
> utilization numbers, then if a CPU is busy running SCHED_IDLE tasks we
> should select it in priority since we know for a fact it's not running
> anything important.
>
> What do you think ?

Sure, I am not saying that the approach taken by this patch is the best or the
worst. We need to come up with better policy on how we can benefit from the
SCHED_IDLE policy and that's where I am looking for inputs from all of you.

Thanks for the feedback.

--
viresh