Re: [2/2] sched/fair: Fix O(# total cgroups) in load balance path

From: Tejun Heo
Date: Mon May 01 2017 - 15:11:32 EST


Hello, Peter.

On Mon, May 01, 2017 at 06:11:58PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 25, 2017 at 05:43:50PM -0700, Tejun Heo wrote:
> > @@ -7007,6 +7008,14 @@ static void update_blocked_averages(int
> > se = cfs_rq->tg->se[cpu];
> > if (se && !skip_blocked_update(se))
> > update_load_avg(se, 0);
> > +
> > + /*
> > + * There can be a lot of idle CPU cgroups. Don't let fully
> > + * decayed cfs_rqs linger on the list.
> > + */
> > + if (!cfs_rq->load.weight && !cfs_rq->avg.load_sum &&
> > + !cfs_rq->avg.util_sum && !cfs_rq->runnable_load_sum)
> > + list_del_leaf_cfs_rq(cfs_rq);
> > }
> > rq_unlock_irqrestore(rq, &rf);
> > }
>
> Right this is a 'known' issue and we recently talked about this.
>
> I think you got the condition right, we want to wait for all the stuff
> to be decayed out before taking it off the list.
>
> The only 'problem', which Vincent mentioned in that other thread, is that
> NOHZ idle doesn't guarantee decay -- then again, you don't want to go
> wake a CPU just to decay this crud either. And if we're idle, the list
> being long doesn't matter either.

The list staying long is fine as long as nobody walks it; however, the
list can be *really* long, e.g. hundreds of thousands long, so walking
it repeatedly won't be a good idea even if the system is idle. As
long as NOHZ decays and trims the list when it ends up walking the
list, and AFAICS it does, it should be fine.

Thanks.

--
tejun