Hm. Piddling with pgbench, which doesn't seem to collapse into a
quivering heap when load exceeds cores these days, deltas weren't all
that impressive, but it does appreciate the extra effort a bit, and a
bit more when clients receive it as well.
If you test, and have time to piddle, you could try letting wake_wide()
return 1 + sched_feat(WAKE_WIDE_IDLE) instead of adding only if wakee is
the dispatcher.
Numbers from my little desktop box.
NO_WAKE_WIDE_IDLE
postgres@homer:~> pgbench.sh
clients 8 tps = 116697.697662
clients 12 tps = 115160.230523
clients 16 tps = 115569.804548
clients 20 tps = 117879.230514
clients 24 tps = 118281.753040
clients 28 tps = 116974.796627
clients 32 tps = 119082.163998 avg 117092.239 1.000
WAKE_WIDE_IDLE
postgres@homer:~> pgbench.sh
clients 8 tps = 124351.735754
clients 12 tps = 124419.673135
clients 16 tps = 125050.716498
clients 20 tps = 124813.042352
clients 24 tps = 126047.442307
clients 28 tps = 125373.719401
clients 32 tps = 126711.243383 avg 125252.510 1.069 1.000
WAKE_WIDE_IDLE (clients as well as server)
postgres@homer:~> pgbench.sh
clients 8 tps = 130539.795246
clients 12 tps = 128984.648554
clients 16 tps = 130564.386447
clients 20 tps = 129149.693118
clients 24 tps = 130211.119780
clients 28 tps = 130325.355433
clients 32 tps = 129585.656963 avg 129908.665 1.109 1.037