Re: [RFC][PATCH v4 3/3] sched: Periodically decay max cost of idlebalance

From: Mike Galbraith
Date: Mon Sep 09 2013 - 21:40:49 EST


On Mon, 2013-09-09 at 14:07 -0700, Jason Low wrote:
> On Mon, 2013-09-09 at 13:49 +0200, Peter Zijlstra wrote:
> > On Wed, Sep 04, 2013 at 12:10:01AM -0700, Jason Low wrote:
> > > On Fri, 2013-08-30 at 12:18 +0200, Peter Zijlstra wrote:
> > > > On Thu, Aug 29, 2013 at 01:05:36PM -0700, Jason Low wrote:
> > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > > > index 58b0514..bba5a07 100644
> > > > > --- a/kernel/sched/core.c
> > > > > +++ b/kernel/sched/core.c
> > > > > @@ -1345,7 +1345,7 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
> > > > >
> > > > > if (rq->idle_stamp) {
> > > > > u64 delta = rq_clock(rq) - rq->idle_stamp;
> > > > > - u64 max = 2*rq->max_idle_balance_cost;
> > > > > + u64 max = 2*(sysctl_sched_migration_cost + rq->max_idle_balance_cost);
> > > >
> > > > You re-introduce sched_migration_cost here because max_idle_balance_cost
> > > > can now drop down to 0 again?
> > >
> > > Yes it was so that max_idle_balance_cost would be at least sched_migration_cost
> > > and that we would still skip idle_balance if avg_idle < sched_migration_cost.
> > >
> > > I also initially thought that adding sched_migration_cost would also account for
> > > the extra "costs" of idle balancing that are not accounted for in the time spent
> > > on each newidle load balance. Come to think of it though, sched_migration_cost
> > > might be too large when used in that context considering we're already using the
> > > max cost.
> >
> > Right, so shall we do as Srikar suggests and drop that initial check?
>
> I agree that we can delete the check between avg_idle and max_idle_balance_cost
> so that large costs in higher domains don't cause balancing to be skipped in
> lower domains as Srikar suggested. Should we keep the old
> "if (this_rq->avg_idle < sysctl_sched_migration_cost)" check?

It was put there to allow cross core scheduling to recover as much
overlap as possible, so rapidly switching communicating tasks with only
small recoverable overlap in the first place don't get pounded to pulp
by overhead instead. If a different way does a better job, whack it.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/