Re: [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology

From: Peter Zijlstra
Date: Fri Apr 14 2017 - 06:48:39 EST


On Thu, Apr 13, 2017 at 07:38:05PM -0400, Rik van Riel wrote:

> What do the sched groups look like for these topologies,
> before and after your patch series?
>
> > 4 nodes, ring topology
> > node distances:
> > node   0   1   2   3
> >   0:  10  20  30  20
> >   1:  20  10  20  30
> >   2:  30  20  10  20
> >   3:  20  30  20  10

kvm -smp 4 -m 4G -display none -monitor null -serial stdio -kernel
defconfig-build/arch/x86/boot/bzImage -append "sched_debug debug
ignore_loglevel earlyprintk=serial,ttyS0,115200,keep
numa=fake=4:10,20,30,20,20,10,20,30,30,20,10,20,20,30,20,10,0"

(FWIW, that's defconfig+kvmconfig+SCHED_DEBUG=y+NUMA_EMU=y)

Gives me:

[ 0.075004] smpboot: Total of 4 processors activated (22345.79 BogoMIPS)
[ 0.076767] CPU0 attaching sched-domain:
[ 0.077003] domain 0: span 0-1,3 level NUMA
[ 0.078002] groups: 0 1 3
[ 0.079002] domain 1: span 0-3 level NUMA
[ 0.080002] groups: 0-1,3 (cpu_capacity = 3072) 1-3 (cpu_capacity = 3072)
[ 0.081005] CPU1 attaching sched-domain:
[ 0.082003] domain 0: span 0-2 level NUMA
[ 0.083002] groups: 1 2 0
[ 0.084002] domain 1: span 0-3 level NUMA
[ 0.085002] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072)
[ 0.086004] CPU2 attaching sched-domain:
[ 0.087002] domain 0: span 1-3 level NUMA
[ 0.088002] groups: 2 3 1
[ 0.089002] domain 1: span 0-3 level NUMA
[ 0.090002] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072)
[ 0.091004] CPU3 attaching sched-domain:
[ 0.092002] domain 0: span 0,2-3 level NUMA
[ 0.093002] groups: 3 0 2
[ 0.094002] domain 1: span 0-3 level NUMA
[ 0.095002] groups: 0-1,3 (cpu_capacity = 3072) 1-3 (cpu_capacity = 3072)
[ 0.096004] span: 0-3 (max cpu_capacity = 1024)


With patches it looks like:

[ 0.080006] smpboot: Total of 4 processors activated (22345.79 BogoMIPS)
[ 0.082545] CPU0 attaching sched-domain:
[ 0.083007] domain 0: span 0-1,3 level NUMA
[ 0.084004] groups: 0 1 3
[ 0.085004] domain 1: span 0-3 level NUMA
[ 0.086004] groups: 0-1,3 (cpu_capacity = 3072) 1-3 (cpu_capacity = 3072)
[ 0.087007] CPU1 attaching sched-domain:
[ 0.088004] domain 0: span 0-2 level NUMA
[ 0.089004] groups: 1 0 2
[ 0.090004] domain 1: span 0-3 level NUMA
[ 0.091003] groups: 0-2 (cpu_capacity = 3072) 0,2-3 (cpu_capacity = 3072)
[ 0.092008] CPU2 attaching sched-domain:
[ 0.093004] domain 0: span 1-3 level NUMA
[ 0.094004] groups: 2 1 3
[ 0.095004] domain 1: span 0-3 level NUMA
[ 0.096004] groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072)
[ 0.097007] CPU3 attaching sched-domain:
[ 0.098004] domain 0: span 0,2-3 level NUMA
[ 0.099003] groups: 3 0 2
[ 0.100004] domain 1: span 0-3 level NUMA
[ 0.101003] groups: 0,2-3 (cpu_capacity = 3072) 0-2 (cpu_capacity = 3072)
[ 0.102007] span: 0-3 (max cpu_capacity = 1024)


Now let me try and reverse engineer those patches ..