Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

From: Vincent Guittot
Date: Thu Feb 20 2020 - 09:36:19 EST


On Wed, 19 Feb 2020 at 21:10, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 19/02/2020 12:55, Vincent Guittot wrote:
> > @@ -740,8 +740,10 @@ void init_entity_runnable_average(struct sched_entity *se)
> > * Group entities are initialized with zero load to reflect the fact that
> > * nothing has been attached to the task group yet.
> > */
> > - if (entity_is_task(se))
> > + if (entity_is_task(se)) {
> > + sa->runnable_avg = SCHED_CAPACITY_SCALE;
>
> So this is a comment that's more related to patch 5, but the relevant bit is
> here. I'm thinking this initialization might be too aggressive wrt load
> balance. This will also give different results between symmetric vs
> asymmetric topologies - a single fork() will make a LITTLE CPU group (at the
> base domain level) overloaded straight away. That won't happen for bigs or on
> symmetric topologies because
>
> // group_is_overloaded()
> sgs->group_capacity * imbalance_pct) < (sgs->group_runnable * 100)
>
> will be false - it would take more than one task for that to happen (due to
> the imbalance_pct).
>
> So maybe what we want here instead is to mimic what he have for utilization,
> i.e. initialize to half the spare capacity of the local CPU. IOW,
> conceptually something like this:
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 99249a2484b4..762717092235 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -740,10 +740,8 @@ void init_entity_runnable_average(struct sched_entity *se)
> * Group entities are initialized with zero load to reflect the fact that
> * nothing has been attached to the task group yet.
> */
> - if (entity_is_task(se)) {
> - sa->runnable_avg = SCHED_CAPACITY_SCALE;
> + if (entity_is_task(se))
> sa->load_avg = scale_load_down(se->load.weight);
> - }
>
> /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> }
> @@ -796,6 +794,8 @@ void post_init_entity_util_avg(struct task_struct *p)
> }
> }
>
> + sa->runnable_avg = sa->util_avg;
> +
> if (p->sched_class != &fair_sched_class) {
> /*
> * For !fair tasks do:
> ---
>
> The current approach has the merit of giving some sort of hint to the LB
> that there is a bunch of new tasks that it could spread out, but I fear it
> is too aggressive.

I agree that setting by default to SCHED_CAPACITY_SCALE is too much
for little core.
The problem for little core can be fixed by using the cpu capacity instead

@@ -796,6 +794,8 @@ void post_init_entity_util_avg(struct task_struct *p)
}
}

+ sa->runnable_avg = cpu_scale;
+
if (p->sched_class != &fair_sched_class) {
/*
* For !fair tasks do:
>
> > sa->load_avg = scale_load_down(se->load.weight);
> > + }
> >
> > /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> > }