Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

From: Dietmar Eggemann
Date: Thu Jul 17 2014 - 04:58:13 EST


On 17/07/14 05:09, Bruno Wolff III wrote:
On Thu, Jul 17, 2014 at 01:18:36 +0200,
Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
So the output of

$ cat /proc/sys/kernel/sched_domain/cpu*/domain*/*

would be handy too.

Thanks, this was helpful.
I see from the sched domain layout that you have SMT (domain0) and DIE (domain1) level. So on this system, the MC level gets degenerated (sd_degenerate() in kernel/sched/core.c).
I fail so far to see how this can have an effect on the memory of the sched groups. But I can try to fake this situation on one of my platforms.

There is also the possibility that the memory for sched_group sg is not (completely) zeroed out:

sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(),
GFP_KERNEL, cpu_to_node(j));


struct sched_group {
...
* NOTE: this field is variable length. (Allocated dynamically
* by attaching extra space to the end of the structure,
* depending on how many CPUs the kernel has booted up with)
*/
unsigned long cpumask[0];
};

so that the cpumask of a sched group is not 0 and can only be cured by an explicit cpumask_clear(sched_group_cpus(sg)) in build_sched_groups() on this kind of machine.


Attached and added to the bug.

Just to make sure, you do have 'CONFIG_X86_32=y' and '# CONFIG_NUMA is
not set' in your build?

Yes.

I probably won't be able to get /proc/schedstat on my next test since the
system will probably crash right away. However, I probably will have a
much faster rebuild and might still be able to get the info for you
before I leave tomorrow.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/