Re: sched: tweak select_idle_sibling to look for idle threads

From: Mike Galbraith
Date: Fri May 06 2016 - 14:54:45 EST


On Thu, 2016-05-05 at 23:03 +0100, Matt Fleming wrote:

> One thing I haven't yet done is twiddled the bits individually to see
> what the best combination is. Have you settled on the right settings
> yet?

Lighter configs, revert sched/fair: Fix fairness issue on migration,
twiddle knobs. Added an IDLE_SIBLING knob to ~virgin master.. only
sorta virgin because I always throttle nohz.

1 x i4790
master
for i in 1 2 4 8; do tbench.sh $i 30 2>&1|grep Throughput; done
Throughput 871.785 MB/sec 1 clients 1 procs max_latency=0.324 ms
Throughput 1514.5 MB/sec 2 clients 2 procs max_latency=0.411 ms
Throughput 2722.43 MB/sec 4 clients 4 procs max_latency=2.400 ms
Throughput 4334.46 MB/sec 8 clients 8 procs max_latency=3.561 ms

echo NO_IDLE_SIBLING > /sys/kernel/debug/sched_features
Throughput 1078.69 MB/sec 1 clients 1 procs max_latency=2.274 ms
Throughput 2130.33 MB/sec 2 clients 2 procs max_latency=1.451 ms
Throughput 3484.18 MB/sec 4 clients 4 procs max_latency=3.430 ms
Throughput 4423.69 MB/sec 8 clients 8 procs max_latency=5.363 ms


masterx
for i in 1 2 4 8; do tbench.sh $i 30 2>&1|grep Throughput; done
Throughput 707.673 MB/sec 1 clients 1 procs max_latency=2.279 ms
Throughput 1503.55 MB/sec 2 clients 2 procs max_latency=0.695 ms
Throughput 2527.73 MB/sec 4 clients 4 procs max_latency=2.321 ms
Throughput 4291.26 MB/sec 8 clients 8 procs max_latency=3.815 ms

echo NO_IDLE_CPU > /sys/kernel/debug/sched_features
homer:~ # for i in 1 2 4 8; do tbench.sh $i 30 2>&1|grep Throughput; done
Throughput 865.936 MB/sec 1 clients 1 procs max_latency=0.411 ms
Throughput 1586.41 MB/sec 2 clients 2 procs max_latency=2.293 ms
Throughput 2638.39 MB/sec 4 clients 4 procs max_latency=2.037 ms
Throughput 4405.43 MB/sec 8 clients 8 procs max_latency=3.581 ms

+ echo NO_AVG_CPU > /sys/kernel/debug/sched_features
+ echo IDLE_SMT > /sys/kernel/debug/sched_features
Throughput 697.126 MB/sec 1 clients 1 procs max_latency=2.220 ms
Throughput 1562.82 MB/sec 2 clients 2 procs max_latency=0.526 ms
Throughput 2620.62 MB/sec 4 clients 4 procs max_latency=6.460 ms
Throughput 4345.13 MB/sec 8 clients 8 procs max_latency=27.921 ms


4 x E7-8890
master
for i in 1 2 4 8 16 32 64 128 256; do tbench.sh $i 30 2>&1| grep Throughput; done
Throughput 615.663 MB/sec 1 clients 1 procs max_latency=0.087 ms
Throughput 1171.53 MB/sec 2 clients 2 procs max_latency=0.087 ms
Throughput 2251.22 MB/sec 4 clients 4 procs max_latency=0.078 ms
Throughput 4090.76 MB/sec 8 clients 8 procs max_latency=0.801 ms
Throughput 7695.92 MB/sec 16 clients 16 procs max_latency=0.235 ms
Throughput 15152 MB/sec 32 clients 32 procs max_latency=0.693 ms
Throughput 21628.2 MB/sec 64 clients 64 procs max_latency=4.666 ms
Throughput 43185.7 MB/sec 128 clients 128 procs max_latency=7.280 ms
Throughput 72144.5 MB/sec 256 clients 256 procs max_latency=8.194 ms

echo NO_IDLE_SIBLING > /sys/kernel/debug/sched_features
Throughput 954.593 MB/sec 1 clients 1 procs max_latency=0.185 ms
Throughput 1882.65 MB/sec 2 clients 2 procs max_latency=0.278 ms
Throughput 3457.03 MB/sec 4 clients 4 procs max_latency=0.431 ms
Throughput 6279.38 MB/sec 8 clients 8 procs max_latency=0.730 ms
Throughput 11170.4 MB/sec 16 clients 16 procs max_latency=0.500 ms
Throughput 21940.9 MB/sec 32 clients 32 procs max_latency=0.475 ms
Throughput 41738.8 MB/sec 64 clients 64 procs max_latency=3.669 ms
Throughput 67634.6 MB/sec 128 clients 128 procs max_latency=6.676 ms
Throughput 76299.7 MB/sec 256 clients 256 procs max_latency=7.878 ms

masterx
for i in 1 2 4 8 16 32 64 128 256; do tbench.sh $i 30 2>&1| grep Throughput; done
Throughput 587.956 MB/sec 1 clients 1 procs max_latency=0.124 ms
Throughput 1140.16 MB/sec 2 clients 2 procs max_latency=0.476 ms
Throughput 2296.03 MB/sec 4 clients 4 procs max_latency=0.142 ms
Throughput 4116.65 MB/sec 8 clients 8 procs max_latency=0.464 ms
Throughput 7820.27 MB/sec 16 clients 16 procs max_latency=0.238 ms
Throughput 14899.2 MB/sec 32 clients 32 procs max_latency=0.321 ms
Throughput 21909.8 MB/sec 64 clients 64 procs max_latency=0.905 ms
Throughput 35495.2 MB/sec 128 clients 128 procs max_latency=6.158 ms
Throughput 75863.2 MB/sec 256 clients 256 procs max_latency=7.650 ms

echo NO_IDLE_CPU > /sys/kernel/debug/sched_features
Throughput 555.15 MB/sec 1 clients 1 procs max_latency=0.096 ms
Throughput 1195.12 MB/sec 2 clients 2 procs max_latency=0.131 ms
Throughput 2276.97 MB/sec 4 clients 4 procs max_latency=0.105 ms
Throughput 4248.14 MB/sec 8 clients 8 procs max_latency=0.131 ms
Throughput 7860.86 MB/sec 16 clients 16 procs max_latency=0.210 ms
Throughput 15178.6 MB/sec 32 clients 32 procs max_latency=0.229 ms
Throughput 21523.9 MB/sec 64 clients 64 procs max_latency=0.842 ms
Throughput 31082.1 MB/sec 128 clients 128 procs max_latency=7.311 ms
Throughput 75887.9 MB/sec 256 clients 256 procs max_latency=7.764 ms

+ echo NO_AVG_CPU > /sys/kernel/debug/sched_features
Throughput 598.063 MB/sec 1 clients 1 procs max_latency=0.131 ms
Throughput 1140.2 MB/sec 2 clients 2 procs max_latency=0.092 ms
Throughput 2268.68 MB/sec 4 clients 4 procs max_latency=0.170 ms
Throughput 4259.7 MB/sec 8 clients 8 procs max_latency=0.212 ms
Throughput 7904.15 MB/sec 16 clients 16 procs max_latency=0.191 ms
Throughput 14840 MB/sec 32 clients 32 procs max_latency=0.279 ms
Throughput 21701.5 MB/sec 64 clients 64 procs max_latency=0.856 ms
Throughput 38945 MB/sec 128 clients 128 procs max_latency=7.501 ms
Throughput 75669.4 MB/sec 256 clients 256 procs max_latency=14.984 ms

+ echo IDLE_SMT > /sys/kernel/debug/sched_features
Throughput 592.799 MB/sec 1 clients 1 procs max_latency=0.120 ms
Throughput 1208.28 MB/sec 2 clients 2 procs max_latency=0.078 ms
Throughput 2319.22 MB/sec 4 clients 4 procs max_latency=0.141 ms
Throughput 4196.64 MB/sec 8 clients 8 procs max_latency=0.253 ms
Throughput 7816.47 MB/sec 16 clients 16 procs max_latency=0.117 ms
Throughput 14990.8 MB/sec 32 clients 32 procs max_latency=0.189 ms
Throughput 21809.4 MB/sec 64 clients 64 procs max_latency=0.832 ms
Throughput 44813 MB/sec 128 clients 128 procs max_latency=7.930 ms
Throughput 75978.1 MB/sec 256 clients 256 procs max_latency=7.337 ms