Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

From: Mike Galbraith
Date: Fri Apr 26 2024 - 12:04:58 EST


On Fri, 2024-04-26 at 13:16 +0200, Peter Zijlstra wrote:
> >
> > I ended up with the below instead; lemme go run this unixbench
> > spawn on it.
>
> Seems to survive that.
>
> I pushed out the patches with updates to queue/sched/eevdf

Yup, solid... but fwiw, tbench liked the previous version better.

trusty (with a 'c') ole i7-4790 box, tbench 8

for i in 1 2 3; do tbench.sh 8 10 2>&1|grep Throughput; done

6.9.0.gc942a0c-master +eevdf.current
NO_DELAY_DEQUEUE
Throughput 3285.04 MB/sec 8 clients 8 procs max_latency=3.481 ms
Throughput 3289.66 MB/sec 8 clients 8 procs max_latency=8.124 ms
Throughput 3293.83 MB/sec 8 clients 8 procs max_latency=2.210 ms
DELAY_DEQUEUE
Throughput 3246.3 MB/sec 8 clients 8 procs max_latency=2.181 ms
Throughput 3236.96 MB/sec 8 clients 8 procs max_latency=6.988 ms
Throughput 3248.6 MB/sec 8 clients 8 procs max_latency=2.130 ms

6.9.0.gc942a0c-master +eevdf.prev
NO_DELAY_DEQUEUE
Throughput 3457.92 MB/sec 8 clients 8 procs max_latency=3.885 ms
Throughput 3470.95 MB/sec 8 clients 8 procs max_latency=4.475 ms
Throughput 3467.87 MB/sec 8 clients 8 procs max_latency=2.182 ms
DELAY_DEQUEUE
Throughput 3712.96 MB/sec 8 clients 8 procs max_latency=4.231 ms
Throughput 3667.87 MB/sec 8 clients 8 procs max_latency=5.020 ms
Throughput 3679.65 MB/sec 8 clients 8 procs max_latency=2.847 ms

Trees are identical modulo extracted eevdf additions. The previous win
that put eevdf on par with cfs went missing.. and then some.

For reference, cfs vs eevdf log extract for previously mentioned gain.

6.1.87-cfs
Throughput 3660.98 MB/sec 8 clients 8 procs max_latency=2.204 ms
Throughput 3678.67 MB/sec 8 clients 8 procs max_latency=10.127 ms
Throughput 3631.89 MB/sec 8 clients 8 procs max_latency=13.019 ms
1.000

6.1.87-eevdf - naked eevdf +fixes
Throughput 3441.86 MB/sec 8 clients 8 procs max_latency=3.943 ms
Throughput 3439.68 MB/sec 8 clients 8 procs max_latency=4.285 ms
Throughput 3432.28 MB/sec 8 clients 8 procs max_latency=3.557 ms
vs cfs .940

6.1.87-eevdf +delay_dequeue.prev patch set
DELAY_DEQUEUE
Throughput 3696.94 MB/sec 8 clients 8 procs max_latency=2.179 ms
Throughput 3694.64 MB/sec 8 clients 8 procs max_latency=6.322 ms
Throughput 3654.49 MB/sec 8 clients 8 procs max_latency=4.101 ms
vs cfs 1.006

box waxes nostalgic (son, when I was yo age [flex])
4.19.312
Throughput 4099.07 MB/sec 8 clients 8 procs max_latency=2.169 ms
Throughput 4107.49 MB/sec 8 clients 8 procs max_latency=12.404 ms
Throughput 4118.41 MB/sec 8 clients 8 procs max_latency=14.150 ms

-Mike