Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

From: Dietmar Eggemann
Date: Fri Jul 18 2014 - 05:28:24 EST


On 18/07/14 07:34, Bruno Wolff III wrote:
On Thu, Jul 17, 2014 at 14:35:02 +0200,
Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

In any case, can someone who can trigger this run with the below; its
'clean' for me, but supposedly you'll trigger a FAIL somewhere.

I got a couple of fail messages.

dmesg output is available in the bug as the following attachment:
https://bugzilla.kernel.org/attachment.cgi?id=143361

The part of interest is probably:

[ 0.253354] build_sched_groups: got group f255b020 with cpus:
[ 0.253436] build_sched_groups: got group f255b120 with cpus:
[ 0.253519] build_sched_groups: got group f255b1a0 with cpus:
[ 0.253600] build_sched_groups: got group f255b2a0 with cpus:
[ 0.253681] build_sched_groups: got group f255b2e0 with cpus:
[ 0.253762] build_sched_groups: got group f255b320 with cpus:
[ 0.253843] build_sched_groups: got group f255b360 with cpus:
[ 0.254004] build_sched_groups: got group f255b0e0 with cpus:
[ 0.254087] build_sched_groups: got group f255b160 with cpus:
[ 0.254170] build_sched_groups: got group f255b1e0 with cpus:
[ 0.254252] build_sched_groups: FAIL
[ 0.254331] build_sched_groups: got group f255b1a0 with cpus: 0
[ 0.255004] build_sched_groups: FAIL
[ 0.255084] build_sched_groups: got group f255b1e0 with cpus: 1

That (partly) explains it. f255b1a0 (5) and f255b1e0 (6) are reused here! This reuse doesn't happen on my machines.

But if they are used for a different cpu mask (not including cpu0 resp. cpu1 this would mess up their first usage?

I guess that the second time, cpu3 will be added to the cpumask of f255b1a0 and cpu4 to f255b1e0?

Maybe we can extend PeterZ patch to print out cpu and span as well us this printk also in free_sched_domain() to debug further if this is not enough evidence?

[ 0.252059] __sdt_alloc: allocated f255b020 with cpus: (1)
[ 0.252147] __sdt_alloc: allocated f255b0e0 with cpus: (2)
[ 0.252229] __sdt_alloc: allocated f255b120 with cpus: (3)
[ 0.252311] __sdt_alloc: allocated f255b160 with cpus: (4)
[ 0.252395] __sdt_alloc: allocated f255b1a0 with cpus: (5)
[ 0.252477] __sdt_alloc: allocated f255b1e0 with cpus: (6)
[ 0.252559] __sdt_alloc: allocated f255b220 with cpus: (7) (not used)
[ 0.252641] __sdt_alloc: allocated f255b260 with cpus: (8) (not used)
[ 0.253013] __sdt_alloc: allocated f255b2a0 with cpus: (9)
[ 0.253097] __sdt_alloc: allocated f255b2e0 with cpus: (10)
[ 0.253184] __sdt_alloc: allocated f255b320 with cpus: (11)
[ 0.253265] __sdt_alloc: allocated f255b360 with cpus: (12)

[ 0.253354] build_sched_groups: got group f255b020 with cpus: (1)
[ 0.253436] build_sched_groups: got group f255b120 with cpus: (3)
[ 0.253519] build_sched_groups: got group f255b1a0 with cpus: (5)
[ 0.253600] build_sched_groups: got group f255b2a0 with cpus: (9)
[ 0.253681] build_sched_groups: got group f255b2e0 with cpus: (10)
[ 0.253762] build_sched_groups: got group f255b320 with cpus: (11)
[ 0.253843] build_sched_groups: got group f255b360 with cpus: (12)
[ 0.254004] build_sched_groups: got group f255b0e0 with cpus: (2)
[ 0.254087] build_sched_groups: got group f255b160 with cpus: (4)
[ 0.254170] build_sched_groups: got group f255b1e0 with cpus: (6)
[ 0.254252] build_sched_groups: FAIL
[ 0.254331] build_sched_groups: got group f255b1a0 with cpus: 0 (5)
[ 0.255004] build_sched_groups: FAIL
[ 0.255084] build_sched_groups: got group f255b1e0 with cpus: 1 (6)
[ 0.255365] devtmpfs: initialized


I also booted with early printk=keepsched_debug as requested by
Dietmar.


Didn't see what I was looking for in your dmesg output. Did you use
'earlyprintk=keep sched_debug'







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/