Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

From: Peter Zijlstra
Date: Tue Oct 18 2016 - 08:07:53 EST


On Tue, Oct 18, 2016 at 12:15:11PM +0100, Dietmar Eggemann wrote:
> On 18/10/16 10:07, Peter Zijlstra wrote:
> > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote:

> > On IRC you mentioned that adding list_add_leaf_cfs_rq() to
> > online_fair_sched_group() cures this, this would actually match with
> > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid
> > a few instructions on the enqueue path, so that's all good.
>
> Yes, I was able to recreate a similar problem (not related to the cpu
> masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only
> put one task (no cpu affinity, so it could run on multiple cpus) in one
> of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug).
>
> I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1

Ah, and since all those CPUs are online, we decay all that load away. OK
makes sense now.

> > I'm just not immediately seeing how that cures things. The only relevant
> > user of the leaf_cfs_rq list seems to be update_blocked_averages() which
> > is called from the balance code (idle_balance() and
> > rebalance_domains()). But neither should call that for offline (or
> > !present) CPUs.
>
> Assuming this is load from the 99 2. level tg's which never had a task
> running, putting list_add_leaf_cfs_rq() into online_fair_sched_group()
> for all cpus makes sure that all the 'blocked load' get's decayed.
>
> Doing what Vincent just suggested, not initializing tg se's w/ 1024 but
> w/ 0 instead prevents this from being necessary.

Indeed. I just worry about the cases where we do no propagate the load
up, eg. the stuff fixed by:

1476695653-12309-5-git-send-email-vincent.guittot@xxxxxxxxxx

If we hit an intermediary cgroup with 0 load, we might get some
interactivity issues.

But it could be I got lost again :-)