Re: [PATCH 1/2] sched/fair: Record the average duration of a task

From: Mike Galbraith
Date: Tue Jul 02 2024 - 01:09:13 EST


On Mon, 2024-07-01 at 22:57 +0800, Chen Yu wrote:
> > Just take a look at the high speed ping-pong thing (not a benchmark,
> > that's a box full of tape measures, rather silly, but..).  TCP_RR IS
> > 1:1, has as short a duration as network stack plus scheduler can
> > possibly make it, and is nearly synchronous to boot, two halves of a
> > whole, the ONLY thing you can certainly safely stack..
>
> I agree, this is a limited scenario.
>
> > but a shared L2 box still takes a wee hit when you do so.
>
> According to a test conducted last month on a system with 500+ CPUs where 4 CPUs
> share the same L2 cache, around 20% improvement was noticed (though not as much
> as on the non-L2 shared platform).

This dinky box doesn't have 500 cores, but it's.. aw, adorable :)

rpi4:/root # ONLY=TCP_RR netperf.sh
TCP_RR-1 unbound Avg: 31754 Sum: 31754
TCP_RR-1 stacked Avg: 26625 Sum: 26625
TCP_RR-1 cross-core Avg: 32325 Sum: 32325

rpi4:/root # tbench.sh 1 30 2>&1|grep Throughput
Throughput 139.024 MB/sec 1 clients 1 procs max_latency=1.116 ms
rpi4:/root # taskset -c 3 tbench.sh 1 30 2>&1|grep Throughput
Throughput 116.765 MB/sec 1 clients 1 procs max_latency=0.340 ms
rpi4:/root #

This little box running its stock 6.6.33 distro kernel pulls out a
cross-core win for both maximally synchronous TCP_RR and the a bit
lesser so but still pretty close tbench. The numbers mean little
though, one propagation speed is lovely, but were there more, I'd be as
stuck with them as I am with rpi4's one-speed (all ahead slow) gearbox.

-Mike