Re: [PATCH RESEND] sched: prefer an idle cpu vs an idle sibling for BALANCE_WAKE

From: Mike Galbraith
Date: Fri Jul 03 2015 - 02:41:18 EST


On Thu, 2015-07-02 at 13:44 -0400, Josef Bacik wrote:

> Now for 3.10 vs 4.0 our request duration time is the same if not
> slightly better on 4.0, so once the workers are doing their job
> everything is a-ok.
>
> The problem is the probability the select queue >= 1 is way different on
> 4.0 vs 3.10. Normally this graph looks like an S, it's essentially 0 up
> to some RPS (requests per second) threshold and then shoots up to 100%
> after the threshold. I'll make a table of these graphs that hopefully
> makes sense, the numbers are different from run to run because of
> traffic and such, the test and control are both run at the same time.
> The header is the probability the select queue >=1
>
> 25% 50% 75%
> 4.0 plain: 371 388 402
> control: 386 394 402
> difference: 15 6 0

So control is 3.10? Virgin?

> So with 4.0 its basically a straight line, at lower RPS we are getting a
> higher probability of a select queue >= 1. We are measuring the cpu
> delay avg ms thing from the scheduler netlink stuff which is how I
> noticed it was scheduler related, our cpu delay is way higher on 4.0
> than it is on 3.10 or 4.0 with the wake idle patch.
>
> So the next test is NO_PREFER_IDLE. This is slightly better than 4.0 plain
> 25% 50% 75%
> NO_PREFER_IDLE: 399 401 414
> control: 385 408 416
> difference: 14 7 2

Hm. Throttling nohz may make larger delta. But never mind that.

> The numbers don't really show it well, but the graphs are closer
> together, it's slightly more s shaped, but still not great.
>
> Next is NO_WAKE_WIDE, which is horrible
>
> 25% 50% 75%
> NO_WAKE_WIDE: 315 344 369
> control: 373 380 388
> difference: 58 36 19
>
> This isn't even in the same ballpark, it's a way worse regression than
> plain.

Ok, this jibes perfectly with 1:N waker/wakee thingy.

> The next bit is NO_WAKE_WIDE|NO_PREFER_IDLE, which is just as bad
>
> 25% 50% 75%
> EVERYTHING: 327 360 383
> control: 381 390 399
> difference: 54 30 19

Ditto.

Hm. Seems what this load should like best is if we detect 1:N, skip all
of the routine gyrations, ie move the N (workers) infrequently, expend
search cycles frequently only on the 1 (dispatch).

Ponder..

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/