Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)

From: Christian Borntraeger
Date: Mon Sep 26 2016 - 08:01:56 EST


On 09/26/2016 01:53 PM, Peter Zijlstra wrote:
> On Mon, Sep 26, 2016 at 01:42:05PM +0200, Christian Borntraeger wrote:
>> On 09/26/2016 12:56 PM, Peter Zijlstra wrote:
>
>>> One of the differences in the old and new thing is being addressed by
>>> these patches:
>>>
>>> https://lkml.kernel.org/r/1473666472-13749-1-git-send-email-vincent.guittot@xxxxxxxxxx
>>>
>>> Could you see if those patches make a difference? If not, we'll have to
>>> go poke elsewhere ofcourse ;-)
>>
>> Those patches do not apply cleanly on v4.7, linux/master or next/master.
>> Is there a good branch to test these patches?
>
> They seemed to apply for me on tip/sched/core, I pushed out a branch for
> you that has them on.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/propagate
>
> I didn't boot the result though; but they applied without issue.

They applied ok on next from 9/13. Things go even worse.
With this host configuration:

CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
0 0 0 0 0 0:0:0:0 yes yes 0
1 0 0 0 0 1:1:1:1 yes yes 1
2 0 0 0 1 2:2:2:2 yes yes 2
3 0 0 0 1 3:3:3:3 yes yes 3
4 0 0 1 2 4:4:4:4 yes yes 4
5 0 0 1 2 5:5:5:5 yes yes 5
6 0 0 1 3 6:6:6:6 yes yes 6
7 0 0 1 3 7:7:7:7 yes yes 7
8 0 0 1 4 8:8:8:8 yes yes 8
9 0 0 1 4 9:9:9:9 yes yes 9
10 0 0 1 5 10:10:10:10 yes yes 10
11 0 0 1 5 11:11:11:11 yes yes 11
12 0 0 1 6 12:12:12:12 yes yes 12
13 0 0 1 6 13:13:13:13 yes yes 13
14 0 0 1 7 14:14:14:14 yes yes 14
15 0 0 1 7 15:15:15:15 yes yes 15

the guest was running either on 0-3 or on 4-15, but never
used the full system. With group scheduling disabled everything was good
again. So looks like that this bug has also some dependency on on the
host topology.

Christian