Re: [PATCH] sched/fair: improve spreading of utilization
From: Vincent Guittot
Date: Fri Mar 13 2020 - 10:26:35 EST
On Fri, 13 Mar 2020 at 13:55, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Fri, 13 Mar 2020 at 13:42, Valentin Schneider
> <valentin.schneider@xxxxxxx> wrote:
> >
> >
> > On Fri, Mar 13 2020, Valentin Schneider wrote:
> > > On Fri, Mar 13 2020, Vincent Guittot wrote:
> > >>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > >>> > index 3c8a379c357e..97a0307312d9 100644
> > >>> > --- a/kernel/sched/fair.c
> > >>> > +++ b/kernel/sched/fair.c
> > >>> > @@ -9025,6 +9025,14 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >>> > case migrate_util:
> > >>> > util = cpu_util(cpu_of(rq));
> > >>> >
> > >>> > + /*
> > >>> > + * Don't try to pull utilization from a CPU with one
> > >>> > + * running task. Whatever its utilization, we will fail
> > >>> > + * detach the task.
> > >>> > + */
> > >>> > + if (nr_running <= 1)
> > >>> > + continue;
> > >>> > +
> > >>>
> > >>> Doesn't this break misfit? If the busiest group is group_misfit_task, it
> > >>> is totally valid for the runqueues to have a single running task -
> > >>> that's the CPU-bound task we want to upmigrate.
> > >>
> > >> group_misfit_task has its dedicated migrate_misfit case
> > >>
> > >
> > > Doh, yes, sorry. I think my rambling on ASYM_PACKING / reduced capacity
> > > migration is still relevant, though.
> > >
> >
> > And with more coffee that's another Doh, ASYM_PACKING would end up as
> > migrate_task. So this only affects the reduced capacity migration, which
>
> yes ASYM_PACKING uses migrate_task and the case of reduced capacity
> would use it too and would not be impacted by this patch. I say
> "would" because the original rework of load balance got rid of this
> case. I'm going to prepare a separate fix for this
After more thought, I think that we are safe for reduced capacity too
because this is handled in the migrate_load case. In my previous
reply, I was thinking of the case where rq is not overloaded but cpu
has reduced capacity which is not handled. But in such case, we don't
have to force the migration of the task because there is still enough
capacity otherwise rq would be overloaded and we are back to the case
already handled
>
> > might be hard to notice in benchmarks.