Re: [RFC][PATCH 2/2] sched: Enqueue tasks on a cpu with only SCHED_IDLE tasks

From: Quentin Perret
Date: Mon Nov 26 2018 - 07:37:52 EST


Hi Viresh,

On Monday 26 Nov 2018 at 16:50:24 (+0530), Viresh Kumar wrote:
> The scheduler tries to schedule a newly wakeup task on an idle CPU to
> make sure the new task gets chance to run as soon as possible, for
> performance reasons.
>
> The SCHED_IDLE scheduling policy is used for tasks which have the lowest
> priority and there is no hurry in running them. If all the tasks
> currently enqueued on a CPU have their policy set to SCHED_IDLE, then
> any new task (non SCHED_IDLE) enqueued on that CPU should normally get a
> chance to run immediately. This patch takes advantage of this to save
> power in some cases by avoiding waking up an idle CPU (which may be in
> some deep idle state) and enqueuing the new task on a CPU which only has
> SCHED_IDLE tasks.

So, avoiding to wake-up a CPU isn't always good for energy. You may
prefer to spread tasks in order to keep the OPP low, for example. What
you're trying to achieve here can be actively harmful for both energy
and performance in some cases, I think.

Also, packing will reduce your chances to go cluster idle (yes you're
not guaranteed to go cluster idle either if you spread depending how
the tasks align in time, but at least there's a chance). So, even from
the idle perspective it's not obvious we actually want to do that.

And finally, the placement that this patch tries to achieve is
inherently unbalanced IIUC. So, unless you hide this behind the EAS
static key, you'll need to make sure the periodic/idle load balance code
doesn't kill all the work you do in the wake-up path. So I'm not sure
this patch really works in practice in its current state.

Now, I think you have a point by saying we could possibly be a bit
smarter with the way we deal with SCHED_IDLE tasks, especially if they
are going to be used more (is that a certainty BTW ?), I'm just not
entirely convinced with the 'power' argument yet.

Maybe there is something we could do if, say we need to schedule a
SCHED_NORMAL task and all CPUs have roughly the same load and/or
utilization numbers, then if a CPU is busy running SCHED_IDLE tasks we
should select it in priority since we know for a fact it's not running
anything important.

What do you think ?

Thanks,
Quentin