Re: [BUG] sched: leaf_cfs_rq_list use after free

From: Tejun Heo
Date: Wed Mar 16 2016 - 12:50:28 EST

Hello, Peter.

On Wed, Mar 16, 2016 at 04:22:45PM +0100, Peter Zijlstra wrote:
> > css_online()
> >
> > The css is now guaranteed to be visible for css_for_each*()
> > iterations. This distinction exists because some controllers
> > need to propagate state changes downwards requiring a new css
> > to become visible before it inherits the current state from
> > the parent. Conversely, there's no reason to use this
> > callback if there's no such requirement.
> >
> > Ex: Freezer which propagates the target state downwards and
> > needs a new child to inherit the current state while
> > iteratable.
> So it looks like sched uses css_online() for no particular reason
> either, I've moved all that into css_alloc().

The parings are alloc <-> free, and online <-> offline,released, so if
you do some part of shutdown in either offline or released, it
probably makes sense to the counterpart of init in online.

> None of that speaks of where Zombies live, am I to infer that Zombies
> pass css_offline() but stall css_released() ?

Yeap, zombies may remain attached to the css before css_released().

> I don't particularly care about iterating css bits, but I do need my
> parent group to still exist, this is now also guaranteed for
> css_release(), right? The above documentation also doesn't mention this;

Yeah, if you do your custom rcu protected data structures which needs
to be accessible after offline, the rules would be the same as
requiring css iteration in the same way, so css_released() would be
the right callback to use.

> in particular I require that css_release() for any group is not called
> before the css_release() of any child group.

That is guaranteed now.

> static void cpu_cgroup_css_free(struct cgroup_subsys_state *css)
> {
> struct task_group *tg = css_tg(css);
> - sched_destroy_group(tg);
> -}
> -
> -static void cpu_cgroup_css_offline(struct cgroup_subsys_state *css)
> -{
> - struct task_group *tg = css_tg(css);
> -
> - sched_offline_group(tg);
> + /*
> + * Relies on the RCU grace period between css_released() and this.
> + */
> + sched_free_group(tg);
> }

Hmmm... I don't think it'd be safe to merge the two ops. Nothing
guarantees that the RCU callback of cpu controller is called after the
cgroup core one and cgroup core one would do use-after-free. Just
changing offline to released should do.