Re: [PATCH v5 09/14] sched: Add over-utilization/tipping point indicator

From: Vincent Guittot
Date: Mon Aug 06 2018 - 04:41:01 EST


On Fri, 3 Aug 2018 at 17:55, Quentin Perret <quentin.perret@xxxxxxx> wrote:
>
> On Friday 03 Aug 2018 at 15:49:24 (+0200), Vincent Guittot wrote:
> > On Fri, 3 Aug 2018 at 10:18, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> > >
> > > On Friday 03 Aug 2018 at 09:48:47 (+0200), Vincent Guittot wrote:
> > > > On Thu, 2 Aug 2018 at 18:59, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> > > > I'm not really concerned about re-enabling load balance but more that
> > > > the effort of packing of tasks in few cpus/clusters that EAS tries to
> > > > do can be broken for every new task.
> > >
> > > Well, re-enabling load balance immediately would break the nice placement
> > > that EAS found, because it would shuffle all tasks around and break the
> > > packing strategy. Letting that sole new task go in find_idlest_cpu()
> >
> > Sorry I was not clear in my explanation. Re enabling load balance
> > would be a problem of course. I wanted to say that there is few chance
> > that this will re-enable the load balance immediately and break EAS
> > and I'm not worried by this case. But i'm only concerned by the new
> > task being put outside EAS policy.
> >
> > For example, if you run on hikey960 the simple script below, which
> > can't really be seen as a fork bomb IMHO, you will see threads
> > scheduled on big cores every 0.5 seconds whereas everything should be
> > packed on little core
>
> I guess that also depends on what's running on the little cores, but I
> see your point.

In my case, the system was idle and nothing else than this script was running

>
> I think we're discussing two different things right now:
> 1. Should forkees go in find_energy_efficient_cpu() ?
> 2. Should forkees have 0 of initial util_avg when EAS is enabled ?

It's the same topic: How EAS should consider a newly created task ?

For now, we let the "performance" mode selects a CPU. This CPU will
most probably be worst CPU from a EAS pov because it's the idlest CPU
in the idlest group which is the opposite of what EAS tries to do

The current behavior is :
For every new task, the cpu selection is done assuming it's a heavy
task with the max possible load_avg, and it looks for the idlest cpu.
This means that if the system is lightly loaded, scheduler will select
most probably a idle big core.
The utilization of this new task is then set to half of the remaining
capacity of the selected CPU which means that the idlest you are, the
biggest the task will be initialized to. This can easily be half a big
core which can be bigger than the max capacity of a little like on
hikey960. Then, util_est will keep track of this value for a while
which will make this task like a big one.

>
> For 1, that would mean all forkees go on prev_cpu. I can see how that
> can be more energy-efficient in some use-cases (the one you described
> for example), but that also has drawbacks. Placing the task on a big
> CPU can have an energy cost, but that should also help the task build
> it's utilization faster, which is what we want to make smart decisions

With current behavior, little task are seen as big for a long time
which is not really help the task to build its utilization faster
IMHO.

> with EAS. Also, it isn't always true that going on the little CPUs is
> more energy efficient, only the Energy Model can tell. There is just no

selecting big or Little is not the problem here. The problem is that
we don't use Energy Model so we will most probably do the wrong
choice. Nevertheless, putting a task on big is clearly the wrong
choice in the case I mentioned above " shell script on hikey960".

> perfect solution, so I'm still not fully decided on that one ...
>
> For 2, I'm a little bit more reluctant, because that has more
> implications ... That could probably harm some fairly standard use
> cases (an simple app-launch for example). Enqueueing something new on a
> CPU would go unnoticed, which might be fine for a very small task, but
> probably a major issue if the task is actually big. I'd be more
> comfortable with 2 only if we also speed-up the PELT half-life TBH ...
>
> Is there a 3 that I missed ?

Having something in the middle like taking into account load and/org
utilization of the parent in order to mitigate big task starting with
small utilization and small task starting with big utilization.
It's probably not perfect because big tasks can create small ones and
the opposite but if there are already big tasks, assuming that the new
one is also a big one should have less power impact as we are already
consuming power for the current bigs. At the opposite, if little are
running, assuming that new task is little will not harm the power
consumption unnecessarily.

My main concern is that by making no choice, you clearly make the most
power consumption choice which is a bit awkward for a policy that
wants to minimize power consumption.

Regards,
Vincent
>
> Thanks,
> Quentin