Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short task on current CPU

From: Peter Zijlstra
Date: Wed Mar 15 2023 - 11:27:12 EST


On Wed, Feb 22, 2023 at 10:09:55PM +0800, Chen Yu wrote:

> will-it-scale
> =============
> case load baseline compare%
> context_switch1 224 groups 1.00 +946.68%
>
> There is a huge improvement in fast context switch test case, especially
> when the number of groups equals the CPUs.
>
> netperf
> =======
> case load baseline(std%) compare%( std%)
> TCP_RR 56-threads 1.00 ( 1.12) -0.05 ( 0.97)
> TCP_RR 112-threads 1.00 ( 0.50) +0.31 ( 0.35)
> TCP_RR 168-threads 1.00 ( 3.46) +5.50 ( 2.08)
> TCP_RR 224-threads 1.00 ( 2.52) +665.38 ( 3.38)
> TCP_RR 280-threads 1.00 ( 38.59) +22.12 ( 11.36)
> TCP_RR 336-threads 1.00 ( 15.88) -0.00 ( 19.96)
> TCP_RR 392-threads 1.00 ( 27.22) +0.26 ( 24.26)
> TCP_RR 448-threads 1.00 ( 37.88) +0.04 ( 27.87)
> UDP_RR 56-threads 1.00 ( 2.39) -0.36 ( 8.33)
> UDP_RR 112-threads 1.00 ( 22.62) -0.65 ( 24.66)
> UDP_RR 168-threads 1.00 ( 15.72) +3.97 ( 5.02)
> UDP_RR 224-threads 1.00 ( 15.90) +134.98 ( 28.59)
> UDP_RR 280-threads 1.00 ( 32.43) +0.26 ( 29.68)
> UDP_RR 336-threads 1.00 ( 39.21) -0.05 ( 39.71)
> UDP_RR 392-threads 1.00 ( 31.76) -0.22 ( 32.00)
> UDP_RR 448-threads 1.00 ( 44.90) +0.06 ( 31.83)
>
> There is significant 600+% improvement for TCP_RR and 100+% for UDP_RR
> when the number of threads equals the CPUs.
>
> tbench
> ======
> case load baseline(std%) compare%( std%)
> loopback 56-threads 1.00 ( 0.15) +0.88 ( 0.08)
> loopback 112-threads 1.00 ( 0.06) -0.41 ( 0.52)
> loopback 168-threads 1.00 ( 0.17) +45.42 ( 39.54)
> loopback 224-threads 1.00 ( 36.93) +24.10 ( 0.06)
> loopback 280-threads 1.00 ( 0.04) -0.04 ( 0.04)
> loopback 336-threads 1.00 ( 0.06) -0.16 ( 0.14)
> loopback 392-threads 1.00 ( 0.05) +0.06 ( 0.02)
> loopback 448-threads 1.00 ( 0.07) -0.02 ( 0.07)
>
> There is no noticeable impact on tbench. Although there is run-to-run variance
> in 168/224 threads case, with or without this patch applied.

So there is a very narrow, but significant, win at 4x overload.
What about 3x/5x overload, they only have very marginal gains.

So these patches are briliant if you run at exactly 4x overload, and
very meh otherwise.

Why do we care about 4x overload?