Re: [RFC 4/4] sched/fair: Take into runnable_avg to classify group

From: Valentin Schneider
Date: Thu Feb 13 2020 - 13:32:51 EST


On 2/11/20 5:46 PM, Vincent Guittot wrote:
> Take into account the new runnable_avg signal to classify a group and to
> mitigate the volatility of util_avg in face of intensive migration.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7d7cb207be30..5f8f12c902d4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7691,6 +7691,7 @@ struct sg_lb_stats {
> unsigned long group_load; /* Total load over the CPUs of the group */
> unsigned long group_capacity;
> unsigned long group_util; /* Total utilization of the group */
> + unsigned long group_runnable; /* Total utilization of the group */
^^^^^^^^^^^
"Total runnable" hurts my eyes, but in any case this shouldn't be just
"utilization".

> unsigned int sum_nr_running; /* Nr of tasks running in the group */
> unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */
> unsigned int idle_cpus;
> @@ -7911,6 +7912,10 @@ group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
> if (sgs->sum_nr_running < sgs->group_weight)
> return true;
>
> + if ((sgs->group_capacity * imbalance_pct) <
> + (sgs->group_runnable * 100))
> + return false;
> +

I haven't stared long enough at patch 2, but I'll ask anyway - with this new
condition, do we still need the next one (based on util)? AIUI
group_runnable is >= group_util, so if group_runnable is within the allowed
margin then group_util has to be as well.

> if ((sgs->group_capacity * 100) >
> (sgs->group_util * imbalance_pct))
> return true;
> @@ -7936,6 +7941,10 @@ group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
> (sgs->group_util * imbalance_pct))
> return true;
>
> + if ((sgs->group_capacity * imbalance_pct) <
> + (sgs->group_runnable * 100))
> + return true;
> +

Ditto on the group_runnable >= group_util - we could get rid of the check
above this one.

> return false;
> }
>
> @@ -8030,6 +8039,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
> sgs->group_load += cpu_load(rq);
> sgs->group_util += cpu_util(i);
> + sgs->group_runnable += cpu_runnable(rq);
> sgs->sum_h_nr_running += rq->cfs.h_nr_running;
>
> nr_running = rq->nr_running;
>