Re: [PATCH v2 8/8] sched/fair: use utilization to select misfit task

From: Vincent Guittot
Date: Fri Aug 02 2019 - 04:29:22 EST


On Thu, 1 Aug 2019 at 18:27, Valentin Schneider
<valentin.schneider@xxxxxxx> wrote:
>
> On 01/08/2019 15:40, Vincent Guittot wrote:
> > @@ -8261,7 +8261,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > * If we have more than one misfit sg go with the
> > * biggest misfit.
> > */
> > - if (sgs->group_misfit_task_load < busiest->group_misfit_task_load)
> > + if (sgs->group_misfit_task_util < busiest->group_misfit_task_util)
> > return false;
>
> I previously said this change would render the maximization useless, but I
> had forgotten one thing: with PELT time scaling, task utilization can go
> above its CPU's capacity.
>
> So if you have two LITTLE CPUs running a busy loop (misfit task) each, the
> one that's been running the longest would have the highest utilization
> (assuming they haven't reached 1024 yet). In other words "utilizations
> above the capacity_margin can't be compared" doesn't really stand.
>
> Still, maximizing load would lead us there. Furthermore, if we have to pick
> between two rqs with misfit tasks, I still believe we should pick the one
> with the highest load, not the highest utilization.
>
> We could keep load and fix the issue of detaching the wrong task with
> something like:
>
> -----8<-----
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 53e64a7b2ae0..bfc2b624ee98 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7489,12 +7489,8 @@ static int detach_tasks(struct lb_env *env)
> case migrate_misfit:
> load = task_h_load(p);
>
> - /*
> - * utilization of misfit task might decrease a bit
> - * since it has been recorded. Be conservative in the
> - * condition.
> - */
> - if (load < env->imbalance)
> + /* This is not a misfit task */
> + if (task_fits_capacity(p, capacity_of(env->src_cpu)))
> goto next;

This could be a solution for make sure to pull only misfit task and
keep using load

>
> env->imbalance = 0;
> ----->8-----
>
> However what would be *even* better IMO would be:
>
> -----8<-----
> @@ -8853,6 +8853,7 @@ voluntary_active_balance(struct lb_env *env)
> return 1;
> }
>
> + /* XXX: make sure current is still a misfit task? */
> if (env->balance_type == migrate_misfit)
> return 1;
>
> @@ -8966,6 +8967,20 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> env.src_rq = busiest;
>
> ld_moved = 0;
> +
> + /*
> + * Misfit tasks are only misfit if they are currently running, see
> + * update_misfit_status().
> + *
> + * - If they're not running, we'll get an opportunity at wakeup to
> + * migrate them to a bigger CPU.
> + * - If they're running, we need to active balance them to a bigger CPU.
> + *
> + * Do the smart thing and go straight for active balance.
> + */
> + if (env->balance_type == migrate_misfit)
> + goto active_balance;
> +

This looks ugly and add a new bypass which this patchset tries to remove
This doesn't work if your misfit task has been preempted by another
one during the load balance and waiting for the runqueue


> if (busiest->nr_running > 1) {
> /*
> * Attempt to move tasks. If find_busiest_group has found
> @@ -9074,7 +9089,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> goto out_all_pinned;
> }
> }
> -
> +active_balance:
> if (!ld_moved) {
> schedstat_inc(sd->lb_failed[idle]);
> /*
> ----->8-----