Re: 4.3 group scheduling regression

From: Yuyang Du
Date: Mon Oct 12 2015 - 06:01:08 EST


On Mon, Oct 12, 2015 at 11:12:06AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 12, 2015 at 08:53:51AM +0800, Yuyang Du wrote:
> > Good morning, Peter.
> >
> > On Mon, Oct 12, 2015 at 10:04:07AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 12, 2015 at 09:44:57AM +0200, Mike Galbraith wrote:
> > >
> > > > It's odd to me that things look pretty much the same good/bad tree with
> > > > hogs vs hogs or hogs vs tbench (with top anyway, just adding up times).
> > > > Seems Xorg+mplayer more or less playing cross group ping-pong must be
> > > > the BadThing trigger.
> > >
> > > Ohh, wait, Xorg and mplayer are _not_ in the same group? I was assuming
> > > you had your entire user session in 1 (auto) group and was competing
> > > against 8 manual cgroups.
> > >
> > > So how exactly are things configured?
> >
> > Hmm... my impression is the naughty boy mplayer (+Xorg) isn't favored, due
> > to the per CPU group entity share distribution. Let me dig more.
>
> So in the old code we had 'magic' to deal with the case where a cgroup
> was consuming less than 1 cpu's worth of runtime. For example, a single
> task running in the group.
>
> In that scenario it might be possible that the group entity weight:
>
> se->weight = (tg->shares * cfs_rq->weight) / tg->weight;
>
> Strongly deviates from the tg->shares; you want the single task reflect
> the full group shares to the next level; due to the whole distributed
> approximation stuff.

Yeah, I thought so.

> I see you've deleted all that code; see the former
> __update_group_entity_contrib().

Probably not there, it actually was an icky way to adjust things.

> It could be that we need to bring that back. But let me think a little
> bit more on this.. I'm having a hard time waking :/

I am guessing it is in calc_tg_weight(), and naughty boys do make them more
favored, what a reality...

Mike, beg you test the following?

--

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..b184da0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2370,7 +2370,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
*/
tg_weight = atomic_long_read(&tg->load_avg);
tg_weight -= cfs_rq->tg_load_avg_contrib;
- tg_weight += cfs_rq_load_avg(cfs_rq);
+ tg_weight += cfs_rq->load.weight;

return tg_weight;
}
@@ -2380,7 +2380,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
long tg_weight, load, shares;

tg_weight = calc_tg_weight(tg, cfs_rq);
- load = cfs_rq_load_avg(cfs_rq);
+ load = cfs_rq->load.weight;

shares = (tg->shares * load);
if (tg_weight)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/