Re: balance storm

From: Peter Zijlstra
Date: Tue May 27 2014 - 06:44:03 EST


On Tue, May 27, 2014 at 12:05:33PM +0200, Mike Galbraith wrote:
> On Tue, 2014-05-27 at 11:48 +0200, Peter Zijlstra wrote:
>
> > So I suppose this is due to the select_idle_sibling() nonsense again,
> > where we assumes L3 is a fair compromise between cheap enough and
> > effective enough.
>
> Nodz.
>
> > Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
> > sizes, 8 cores isn't nowhere near their top silly, which shifts the
> > balance, and there's always going to be pathological cases (like the
> > proposed workload) where its just always going to suck eggs.
>
> Test is as pathological as it gets. 15 core + SMT wouldn't be pretty.

So one thing we could maybe do is measure the cost of
select_idle_sibling(), just like we do for idle_balance() and compare
this against the tasks avg runtime.

We can go all crazy and do reduced searches; like test every n-th cpu in
the mask, or make it statistical and do a full search ever n wakeups.

Not sure what's a good approach. But L3 spanning more and more CPUs is
not something that's going to get cured anytime soon I'm afraid.

Not to mention bloody SMT which makes the whole mess worse.

Attachment: pgpxOeZn9uBzs.pgp
Description: PGP signature