Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

From: Mike Galbraith
Date: Mon Jul 06 2015 - 14:36:38 EST


On Mon, 2015-07-06 at 10:34 -0400, Josef Bacik wrote:
> On 07/06/2015 01:13 AM, Mike Galbraith wrote:
> > Hm. Piddling with pgbench, which doesn't seem to collapse into a
> > quivering heap when load exceeds cores these days, deltas weren't all
> > that impressive, but it does appreciate the extra effort a bit, and a
> > bit more when clients receive it as well.
> >
> > If you test, and have time to piddle, you could try letting wake_wide()
> > return 1 + sched_feat(WAKE_WIDE_IDLE) instead of adding only if wakee is
> > the dispatcher.
> >
> > Numbers from my little desktop box.
> >
> > NO_WAKE_WIDE_IDLE
> > postgres@homer:~> pgbench.sh
> > clients 8 tps = 116697.697662
> > clients 12 tps = 115160.230523
> > clients 16 tps = 115569.804548
> > clients 20 tps = 117879.230514
> > clients 24 tps = 118281.753040
> > clients 28 tps = 116974.796627
> > clients 32 tps = 119082.163998 avg 117092.239 1.000
> >
> > WAKE_WIDE_IDLE
> > postgres@homer:~> pgbench.sh
> > clients 8 tps = 124351.735754
> > clients 12 tps = 124419.673135
> > clients 16 tps = 125050.716498
> > clients 20 tps = 124813.042352
> > clients 24 tps = 126047.442307
> > clients 28 tps = 125373.719401
> > clients 32 tps = 126711.243383 avg 125252.510 1.069 1.000
> >
> > WAKE_WIDE_IDLE (clients as well as server)
> > postgres@homer:~> pgbench.sh
> > clients 8 tps = 130539.795246
> > clients 12 tps = 128984.648554
> > clients 16 tps = 130564.386447
> > clients 20 tps = 129149.693118
> > clients 24 tps = 130211.119780
> > clients 28 tps = 130325.355433
> > clients 32 tps = 129585.656963 avg 129908.665 1.109 1.037

I had a typo in my script, so those desktop box numbers were all doing
the same number of clients. It doesn't invalidate anything, but the
individual deltas are just run to run variance.. not to mention that
single cache box is not all that interesting for this anyway. That
happens when interconnect becomes a player.

> I have time for twiddling, we're carrying ye olde WAKE_IDLE until we get
> this solved upstream and then I'll rip out the old and put in the new,
> I'm happy to screw around until we're all happy. I'll throw this in a
> kernel this morning and run stuff today. Barring any issues with the
> testing infrastructure I should have results today. Thanks,

I'll be interested in your results. Taking pgbench to a little NUMA
box, I'm seeing _nada_ outside of variance with master (crap). I have a
way to win significantly for _older_ kernels, and that win over master
_may_ provide some useful insight, but I don't trust postgres/pgbench as
far as I can toss the planet, so don't have a warm fuzzy about trying to
use it to approximate your real world load.

BTW, what's your topology look like (numactl --hardware).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/