Re: [RFC PATCH 1/2] sched: Clean up SD_BALANCE_WAKE flags in sched domain build-up

From: Mike Galbraith
Date: Thu Jun 02 2016 - 01:50:32 EST


On Thu, 2016-06-02 at 04:03 +0800, Yuyang Du wrote:
> On Wed, Jun 01, 2016 at 11:36:39AM +0200, Mike Galbraith wrote:
> > > Yup. Up to this point, we don't have any disagreement. And I don't think we
> > > have any disagreement conceptually. What the next patch really does is:
> > >
> > > (1) we don't remove SD_BALANCE_WAKE as an important sched_domain flag, on
> > > the contrary, we strengthen it.
> > >
> > > (2) the semantic of SD_BALANCE_WAKE is currently represented by SD_WAKE_AFFINE,
> > > we actually remove this representation.
> >
> > Nope, those two have different meanings. We pass SD_BALANCE_WAKE to
> > identify a ttwu() wakeup, just as we pass SD_BALANCE_FORK to say we're
> > waking a child. SD_WAKE_AFFINE means exactly what it says, but is only
> > applicable to ttwu() wakeups.
>
> I don't disagree, but want to add that, SD_WAKE_AFFINE has no meaning that is so
> special and so important for anyone to use the flag to tune anything. If you want
> to do any SD_BALANCE_*, waker CPU is a valid candidate if allowed, that is it.

That flag lets the user specifically tell us that he doesn't want us to
bounce his tasks around the box, cache misses be damned. The user may
_know_ that say cross node migrations hurt his load more than help, and
not want us to do that, thus expresses himself by turning the flag off
at whatever level. People do that. You can force them to take other
measures, but why do that?

> IIUC your XXX mark and your comment "Prefer wake_affine over balance flags", you
> said the same thing: SD_WAKE_AFFINE should be part of SD_BALANCE_WAKE, and should
> be part of all SD_BALANCE_* flags,

Peter wrote that, but I don't read it the way you do. I read as if the
user wants the benefits of affine wakeups, he surely doesn't want us to
send the wakee off to god know where on every wakeup _instead_ of
waking affine, he wants to balance iff he can't have an affine wakeup.

> > > (3) regarding the semantic of SD_WAKE_AFFINE, it is really not about selecting
> > > waker CPU or about the fast path. Conceptually, it is just saying the waker
> > > CPU is a valid and important candidate if SD_BALANCE_WAKE, which is just so
> > > obvious, so I don't think it deserves to be a separate sched_domain flag.
> >
> > SD_WAKE_AFFINE being a separate domain flag, the user can turn it
> > on/off... separately :)
>
> Sure, that is very true, :) But turning it off for what, that is a big question mark.
> We don't want a flag unless the flag is for goodness, and not a flag with big question
> mark.

It has a good excuse to exist.

> > > (4) the outcome is, if SD_BALANCE_WAKE, we definitely will/should try waker CPU,
> > > and if !SD_BALANCE_WAKE, we don't try waker CPU. So nothing functional is
> > > changed.
> >
> > If wake_wide() says we do not want an affine wakeup, but you apply
> > SD_WAKE_AFFINE meaning to SD_BALANCE_WAKE and turn it on in ->flags,
> > we'll give the user a free sample of full balance cost, no?
>
> Yes, and otherwise we don't select anything? That is just bad engough whether worse
> or not. So the whole fuss I made is really that this is a right thing to start with. :)

Nope, leaving tasks where they were is not a bad thing. Lots of stuff
likes the scheduler best when it leaves them the hell alone :) That
works out well all around, balance cycles are spent in userspace
instead, scheduler produces wins by doing nothing, perfect.

-Mike