Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle_cpu()

From: Mike Galbraith
Date: Tue Sep 12 2023 - 05:40:30 EST


On Mon, 2023-09-11 at 18:19 +0800, Chen Yu wrote:
>
> > Speaking of cache-hot idle CPU, is netperf actually more happy with
> > piling on current CPU?
>
> Yes. Per my previous test, netperf of TCP_RR/UDP_RR really likes to
> put the waker and wakee together.

Hm, seems there's at least one shared L2 case where that's untrue by
more than a tiny margin, which surprised me rather a lot.

For grins, I tested netperf on my dinky rpi4b, and while its RR numbers
seem kinda odd, they're also seemingly repeatable (ergo showing them).
I measured a very modest cross-core win on a shared L2 Intel CPU some
years ago (when Q6600 was shiny/new) but nothing close to these deltas.

Makes me wonder what (a tad beefier) Bulldog RR numbers look like.

root@rpi4:~# ONLY=TCP_RR netperf.sh
TCP_RR-1 unbound Avg: 29611 Sum: 29611
TCP_RR-1 stacked Avg: 22540 Sum: 22540
TCP_RR-1 cross-core Avg: 30181 Sum: 30181

root@rpi4:~# netperf.sh
TCP_SENDFILE-1 unbound Avg: 15572 Sum: 15572
TCP_SENDFILE-1 stacked Avg: 11533 Sum: 11533
TCP_SENDFILE-1 cross-core Avg: 15751 Sum: 15751

TCP_STREAM-1 unbound Avg: 6331 Sum: 6331
TCP_STREAM-1 stacked Avg: 6031 Sum: 6031
TCP_STREAM-1 cross-core Avg: 6211 Sum: 6211

TCP_MAERTS-1 unbound Avg: 6306 Sum: 6306
TCP_MAERTS-1 stacked Avg: 6094 Sum: 6094
TCP_MAERTS-1 cross-core Avg: 9393 Sum: 9393

UDP_STREAM-1 unbound Avg: 22277 Sum: 22277
UDP_STREAM-1 stacked Avg: 18844 Sum: 18844
UDP_STREAM-1 cross-core Avg: 24749 Sum: 24749

TCP_RR-1 unbound Avg: 29674 Sum: 29674
TCP_RR-1 stacked Avg: 22267 Sum: 22267
TCP_RR-1 cross-core Avg: 30237 Sum: 30237

UDP_RR-1 unbound Avg: 36189 Sum: 36189
UDP_RR-1 stacked Avg: 27129 Sum: 27129
UDP_RR-1 cross-core Avg: 37033 Sum: 37033