Re: regression 4.4: deadlock in with cgroup percpu_rwsem

From: Peter Zijlstra
Date: Wed Jan 20 2016 - 11:49:53 EST


On Wed, Jan 20, 2016 at 11:04:35AM -0500, Tejun Heo wrote:
> On Wed, Jan 20, 2016 at 10:30:07AM -0500, Tejun Heo wrote:
> > > So the current place in free_fair_sched_group() is far too late to be
> > > calling remove_entity_load_avg(). But I'm not sure where I should put
> > > it, it needs to be in a place where we know the group is going to die
> > > but its parent is guaranteed to still exist.
> > >
> > > Would offline be that place?
> >
> > Hmmm... css_free would be with the following patch.
>
> I thought a bit more about this and I think the right thing to do here
> is making both css_offline and css_free follow the ancestry order.
> I'll post a patch to do that soon. offline is called at the head of
> destruction when the css is made invisble and draining of existing
> refs starts. free at the end of that process. Tree ordering
> shouldn't be where the two differ.

OK, that would be good. Meanwhile the above seems to suggest that
css_offline is already hierarchical?

I get the feeling the way sched uses the css_{offline,release,free} is
sub-optimal. cpu_cgrp_subsys::css_free := sched_destroy_group() does a
call_rcu, whereas if I read the comment with css_free_work_fn()
correctly, this is already after a grace-period, so yet another doesn't
make sense.