Re: [PATCH v2 5/8] sched/fair: use rq->nr_running when balancing load

From: Vincent Guittot
Date: Mon Sep 02 2019 - 09:07:41 EST


Hi Hillf,

Sorry for the late reply.
I have noticed that i didn't answer your question while preparing v3

On Fri, 9 Aug 2019 at 07:21, Hillf Danton <hdanton@xxxxxxxx> wrote:
>
>
> On Thu, 1 Aug 2019 16:40:21 +0200 Vincent Guittot wrote:
> >
> > cfs load_balance only takes care of CFS tasks whereas CPUs can be used by
> > other scheduling class. Typically, a CFS task preempted by a RT or deadline
> > task will not get a chance to be pulled on another CPU because the
> > load_balance doesn't take into account tasks from classes.
>
> We can add something accordingly in RT to push cfs tasks to another cpu
> in this direction if the pulling above makes some sense missed long.

RT class doesn't and can't touch CFS tasks but the ilb will be kicked
to check if another CPU can pull the CFS task.

> I doubt we can as we can not do too much about RT tasks on any cpu.
> Nor is busiest cpu selected for load balancing based on a fifo cpuhog.

This patch takes into account all type tasks when checking the state
of a group and when trying to balance the number of tasks but of
course we can only detach and move the cfs tasks at the end.

So if we have 1 RT task and 1 CFS task competing for the same CPU but
there is an idle CPU, the CFS task will be pulled during the
load_balance of the idle CPU whereas it was not the case before.

>
> > Add sum of nr_running in the statistics and use it to detect such
> > situation.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index a8681c3..f05f1ad 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7774,6 +7774,7 @@ struct sg_lb_stats {
> > unsigned long group_load; /* Total load over the CPUs of the group */
> > unsigned long group_capacity;
> > unsigned long group_util; /* Total utilization of the group */
> > + unsigned int sum_nr_running; /* Nr tasks running in the group */
> > unsigned int sum_h_nr_running; /* Nr tasks running in the group */
>
> A different comment is appreciated.

ok

>
> > unsigned int idle_cpus;
> > unsigned int group_weight;
> > @@ -8007,7 +8008,7 @@ static inline int sg_imbalanced(struct sched_group *group)
> > static inline bool
> > group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs)
> > {
> > - if (sgs->sum_h_nr_running < sgs->group_weight)
> > + if (sgs->sum_nr_running < sgs->group_weight)
> > return true;
> >
> > if ((sgs->group_capacity * 100) >
> > @@ -8028,7 +8029,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs)
> > static inline bool
> > group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs)
> > {
> > - if (sgs->sum_h_nr_running <= sgs->group_weight)
> > + if (sgs->sum_nr_running <= sgs->group_weight)
> > return false;
> >
> > if ((sgs->group_capacity * 100) <
> > @@ -8132,6 +8133,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> > sgs->sum_h_nr_running += rq->cfs.h_nr_running;
> >
> > nr_running = rq->nr_running;
> > + sgs->sum_nr_running += nr_running;
> > +
> > if (nr_running > 1)
> > *sg_status |= SG_OVERLOAD;
> >
> > @@ -8480,7 +8483,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
> > * groups.
> > */
> > env->balance_type = migrate_task;
> > - env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1;
> > + env->imbalance = (busiest->sum_nr_running - local->sum_nr_running) >> 1;
>
> Can we detach RR tasks?
>
> > return;
> > }
> >
> > @@ -8643,7 +8646,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >
> > /* Try to move all excess tasks to child's sibling domain */
> > if (sds.prefer_sibling && local->group_type == group_has_spare &&
> > - busiest->sum_h_nr_running > local->sum_h_nr_running + 1)
> > + busiest->sum_nr_running > local->sum_nr_running + 1)
> > goto force_balance;
> >
> > if (busiest->group_type != group_overloaded &&
> > --
> > 2.7.4
>