Re: [PATCH] sched: fix infinity loop in update_blocked_averages
From: Vincent Guittot
Date: Fri Dec 28 2018 - 12:25:51 EST
On Fri, 28 Dec 2018 at 17:54, Tejun Heo <tj@xxxxxxxxxx> wrote:
> On Fri, Dec 28, 2018 at 10:30:07AM +0100, Vincent Guittot wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d1907506318a..88b9118b5191 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7698,7 +7698,8 @@ static void update_blocked_averages(int cpu)
> > > * There can be a lot of idle CPU cgroups. Don't let fully
> > > * decayed cfs_rqs linger on the list.
> > > */
> > > - if (cfs_rq_is_decayed(cfs_rq))
> > > + if (cfs_rq_is_decayed(cfs_rq) &&
> > > + rq->tmp_alone_branch == &rq->leaf_cfs_rq_list)
> > > list_del_leaf_cfs_rq(cfs_rq);
> > This patch reduces the cases but I don't thinks it's enough because it
> > doesn't cover the case of unregister_fair_sched_group()
> > And we can still break the ordering of the cfs_rq
> So, if unregister_fair_sched_group() can corrupt list, the bug is
> there regardless of a9e7f6544b9ce, right?
I don't think so because without a9e7f6544b9ce, the insertion in the
list is done only once and we can't call unregister_fair_sched_group
while an enqueue is ongoing so tmp_alone_branch always point to
a9e7f6544b9ce enables to have tmp_alone_branch not pointing to
> Is there a reason why we're building a dedicated list for avg
> propagation? AFAICS, it's just doing depth first walk, which can be
we have this list of sched group of the rq to update the load of each
cfs_rq and as a result the load of the task group. This is then used
to compute the share of a task group between CPUs.
This list must be ordered to correctly propagate the updates from leafs to root.
> done without extra space as long as each node has the parent pointer,
> which they do. Is the dedicated list an optimization?
It prevents to parse and walk all task group struct every time.
Instead, you just have to follow a linked list