Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance
From: Phil Auld
Date: Wed Oct 30 2019 - 13:19:29 EST
Hi,
On Wed, Oct 30, 2019 at 05:35:55PM +0100 Valentin Schneider wrote:
>
>
> On 30/10/2019 17:24, Dietmar Eggemann wrote:
> > On 30.10.19 15:39, Phil Auld wrote:
> >> Hi Vincent,
> >>
> >> On Mon, Oct 28, 2019 at 02:03:15PM +0100 Vincent Guittot wrote:
> >
> > [...]
> >
> >>>> When you say slow versus fast wakeup paths what do you mean? I'm still
> >>>> learning my way around all this code.
> >>>
> >>> When task wakes up, we can decide to
> >>> - speedup the wakeup and shorten the list of cpus and compare only
> >>> prev_cpu vs this_cpu (in fact the group of cpu that share their
> >>> respective LLC). That's the fast wakeup path that is used most of the
> >>> time during a wakeup
> >>> - or start to find the idlest CPU of the system and scan all domains.
> >>> That's the slow path that is used for new tasks or when a task wakes
> >>> up a lot of other tasks at the same time
> >
> > [...]
> >
> > Is the latter related to wake_wide()? If yes, is the SD_BALANCE_WAKE
> > flag set on the sched domains on your machines? IMHO, otherwise those
> > wakeups are not forced into the slowpath (if (unlikely(sd))?
> >
> > I had this discussion the other day with Valentin S. on #sched and we
> > were not sure how SD_BALANCE_WAKE is set on sched domains on
> > !SD_ASYM_CPUCAPACITY systems.
> >
>
> Well from the code nobody but us (asymmetric capacity systems) set
> SD_BALANCE_WAKE. I was however curious if there were some folks who set it
> with out of tree code for some reason.
>
> As Dietmar said, not having SD_BALANCE_WAKE means you'll never go through
> the slow path on wakeups, because there is no domain with SD_BALANCE_WAKE for
> the domain loop to find. Depending on your topology you most likely will
> go through it on fork or exec though.
>
> IOW wake_wide() is not really widening the wakeup scan on wakeups using
> mainline topology code (disregarding asymmetric capacity systems), which
> sounds a bit... off.
Thanks. It's not currently set. I'll set it and re-run to see if it makes
a difference.
However, I'm not sure why it would be making a difference for only the cgroup
case. If this is causing issues I'd expect it to effect both runs.
In general I think these threads want to wake up the last cpu they were on.
And given there are fewer cpu bound tasks that CPUs that wake cpu should,
more often than not, be idle.
Cheers,
Phil
--