Re: [PATCH v2 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity

From: Tejun Heo
Date: Wed May 03 2017 - 17:46:01 EST

Hello, Peter.

On Wed, May 03, 2017 at 08:00:28PM +0200, Peter Zijlstra wrote:
> Just FUDGE2 on its own seems to be the best on my system and is a change
> that makes sense (and something Paul recently pointed out as well).
> The implementation isn't particularly pretty or fast, but should
> illustrate the idea.
> Poking at the whole update_tg_cfs_load() thing only makes it worse after
> that. And while I agree that that code is mind bending; it seems to work
> OK-ish.
> Tejun, Vincent, could you guys have a poke?

So, just preliminary testing.

FUDGE: Does cut down the number of wrong picks by about 70% and p99
latency by about half; however, the resulting p99 is still
worse by 5 - 10 times compared to !cgroup case.

FUDGE2: Changes things a lot (load values go wild) but only because
it's missing scale_load_down(). After adding
scale_load_down(), it doesn't do much. For this to work, it
needs to be always propagated, which btw shouldn't be
prohibitively expensive given other operations which are
performed at the same time.