Re: [PATCH 1/2] sched/fair: Fix weight overly small for interactive group entity

From: Dietmar Eggemann
Date: Fri Oct 16 2015 - 10:06:13 EST


Hi Yuhang,

On 13/10/15 02:18, Yuyang Du wrote:
> Commit 9d89c257dfb9c51a532d69 (sched/fair: Rewrite runnable load
> and utilization average tracking) led to overly small weight for
> interactive group entity. The case can be easily reproduced when
> a number of CPU hogs compete for the CPUs at the same time (thanks
> to Mike). This is largly because the task group's load average
> tracking cross CPUs lags behind the real changes.
>
> We accelerate the group share distribution process by using the
> load.weight of the cfs_rq. This may increase the entire group's
> share, but we have to do so to protect the (fragile) interactive
> tasks from especially CPU hogs.
>
> Reported-by: Mike Galbraith <umgwanakikbuti@xxxxxxxxx>
> Signed-off-by: Yuyang Du <yuyang.du@xxxxxxxxx>
> ---
> kernel/sched/fair.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 700eb54..601a253 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> */

In commit 9d89c257dfb9c51a532d69 you also changed the comment on top of
this from 'CPU's actual weight' to 'CPU's real-time load'. I assume here
that the former stands for cfs_rq->load.weight and the latter for
cfs_rq->avg.load_avg. Not sure though ...

> tg_weight = atomic_long_read(&tg->load_avg);
> tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>
> return tg_weight;
> }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
> long tg_weight, load, shares;
>
> tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>
> shares = (tg->shares * load);
> if (tg_weight)
>

I get similar test results on a i7-4750HQ (1*4*2) system

Test setup: 1 cpuhog (sysbench --test cpu --num-threads 1) each in his
own cpu cgroup and each pinned to a cpu
mlayer (threads=8) in a cpu cgroup (BigBuckBunny-
DivXPlusHD.mkv)
runtime: 100s
system: Ubuntu 14.04.3 LTS desktop

sum of the runtime of the mplayer threads:

commit cd126afe838d (before pelt rewrite): 55.6s
4.3.0-rc5 : 36.5s
4.3.0-rc5 + patch 1/2 and 2/2 : 55.7s

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/