Re: balance storm

From: Peter Zijlstra
Date: Tue May 27 2014 - 05:48:31 EST


On Mon, May 26, 2014 at 07:49:10PM +0800, Libo Chen wrote:
> On 2014/5/26 15:56, Mike Galbraith wrote:
> > On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote:
> >> hi,
> >> my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
> >> 3.4.24stable, startup 50 same process, every process is sample:
> >>
> >> #include <unistd.h>
> >>
> >> int main()
> >> {
> >> for (;;)
> >> {
> >> unsigned int i = 0;
> >> while (i< 100){
> >> i++;
> >> }
> >> usleep(100);
> >> }
> >>
> >> return 0;
> >> }
> >>
> >> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.
> >
> > My 8 socket 64 core DL980 running 256 copies (3.14-rt5) munches ~4%/copy
> > per top, and does roughly 1 sh*tload migrations, nano-work loop or not.
> > Turn SD_SHARE_PKG_RESOURCES off at MC (not a noop here), and consumption
> > drops to ~2%/copy, and migrations ('course) mostly go away.

So:

1) what kind of weird ass workload is that? Why are you waking up so
often to do no work?

2) turning on/off share_pkg_resource is a horrid hack whichever way
aruond you turn it.

So I suppose this is due to the select_idle_sibling() nonsense again,
where we assumes L3 is a fair compromise between cheap enough and
effective enough.

Of course, Intel keeps growing the cpu count covered by L3 to ridiculous
sizes, 8 cores isn't nowhere near their top silly, which shifts the
balance, and there's always going to be pathological cases (like the
proposed workload) where its just always going to suck eggs.

Also, when running 50 such things on a 16 cpu machine, you get roughly 3
per cpu, since their runtime is stupid low, I would expect it to pretty
much always hit an idle cpu, which in turn should inhibit the migration.

Then again, maybe the timer slack is causing you grief, resulting in all
3 being woken at the same time, instead of having them staggered.

In any case, I'm not sure what the 'regression' report is against, as
there's only a single kernel version mentioned: 3.4, and that's almost a
dinosaur.

Attachment: pgpu0ZVAfc5FP.pgp
Description: PGP signature