Re: 4.3 group scheduling regression

From: Peter Zijlstra
Date: Mon Oct 12 2015 - 07:47:31 EST


On Mon, Oct 12, 2015 at 10:12:31AM +0800, Yuyang Du wrote:
> On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:

> > So in the old code we had 'magic' to deal with the case where a cgroup
> > was consuming less than 1 cpu's worth of runtime. For example, a single
> > task running in the group.
> >
> > In that scenario it might be possible that the group entity weight:
> >
> > se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
> >
> > Strongly deviates from the tg->shares; you want the single task reflect
> > the full group shares to the next level; due to the whole distributed
> > approximation stuff.
>
> Yeah, I thought so.
>
> > I see you've deleted all that code; see the former
> > __update_group_entity_contrib().
>
> Probably not there, it actually was an icky way to adjust things.

Yeah, no argument there.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4df37a4..b184da0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> */
> tg_weight = atomic_long_read(&tg->load_avg);
> tg_weight -= cfs_rq->tg_load_avg_contrib;
> - tg_weight += cfs_rq_load_avg(cfs_rq);
> + tg_weight += cfs_rq->load.weight;
>
> return tg_weight;
> }
> @@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
> long tg_weight, load, shares;
>
> tg_weight = calc_tg_weight(tg, cfs_rq);
> - load = cfs_rq_load_avg(cfs_rq);
> + load = cfs_rq->load.weight;
>
> shares = (tg->shares * load);
> if (tg_weight)

Aah, yes very much so. I completely overlooked that :-(

When calculating shares we very much want the current load, not the load
average.

Also, should we do the below? At this point se->on_rq is still 0 so
reweight_entity() will not update (dequeue/enqueue) the accounting, but
we'll have just accounted the 'old' load.weight.

Doing it this way around we'll first update the weight and then account
it, which seems more accurate.

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 700eb548315f..d2efef565aed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3009,8 +3009,8 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
*/
update_curr(cfs_rq);
enqueue_entity_load_avg(cfs_rq, se);
- account_entity_enqueue(cfs_rq, se);
update_cfs_shares(cfs_rq);
+ account_entity_enqueue(cfs_rq, se);

if (flags & ENQUEUE_WAKEUP) {
place_entity(cfs_rq, se, 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/