Re: [RFC 1/2] sched: reduce migration cost between faster caches for idle_balance

From: Mike Galbraith
Date: Thu Feb 15 2018 - 13:08:18 EST


On Thu, 2018-02-15 at 11:35 -0500, Steven Sistare wrote:
> On 2/10/2018 1:37 AM, Mike Galbraith wrote:
> > On Fri, 2018-02-09 at 11:08 -0500, Steven Sistare wrote:
> >>>> @@ -8804,7 +8803,8 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
> >>>> if (!(sd->flags & SD_LOAD_BALANCE))
> >>>> continue;
> >>>>
> >>>> - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> >>>> + if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost +
> >>>> + sd->sched_migration_cost) {
> >>>> update_next_balance(sd, &next_balance);
> >>>> break;
> >>>> }
> >>>
> >>> Ditto.
> >>
> >> The old code did not migrate if the expected costs exceeded the expected idle
> >> time. The new code just adds the sd-specific penalty (essentially loss of cache
> >> footprint) to the costs. The for_each_domain loop visit smallest to largest
> >> sd's, hence visiting smallest to largest migration costs (though the tunables do
> >> not enforce an ordering), and bails at the first sd where the total cost is a lose.
> >
> > Hrm..
> >
> > You're now adding a hypothetical cost to the measured cost of running
> > the LB machinery, which implies that the measurement is insufficient,
> > but you still don't say why it is insufficient.  What happens if you
> > don't do that?  I ask, because when I removed the...
> >
> >    this_rq->avg_idle < sysctl_sched_migration_cost
> >
> > ...bits to check removal effect for Peter, the original reason for it
> > being added did not re-materialize, making me wonder why you need to
> > make this cutoff more aggressive.
>
> The current code with sysctl_sched_migration_cost discourages migration
> too much, per our test results.

That's why I asked you what happens if you only whack the _apparently_
(but maybe not) obsolete old throttle, it appeared likely that your win
came from allowing a bit more migration than the simple throttle
allowed, which if true, would obviate the need for anything more.

> Can you provide more details on the sysbench oltp test that motivated you
> to add sysctl_sched_migration_cost to idle_balance, so Rohit can re-test it?

The problem at that time was the cycle overhead of entering that LB
path at high frequency.  Dirt simple.

-Mike