Re: [BUG] sched: leaf_cfs_rq_list use after free
From: Niklas Cassel
Date: Thu Mar 17 2016 - 04:29:19 EST
On 03/16/2016 04:22 PM, Peter Zijlstra wrote:
> Subject: sched: Fix/cleanup cgroup teardown/init
>
> The cpu controller hasn't kept up with the various changes in the whole
> cgroup initialization / destruction sequence, and commit 2e91fa7f6d45
> ("cgroup: keep zombies associated with their original cgroups") caused
> it to explode.
>
> The reason for this is that zombies do not inhibit css_offline() from
> being called, but do stall css_released(). Now we tear down the cfs_rq
> structures on css_offline() but zombies can run after that, leading to
> use-after-free issues.
>
> The solution is to move the tear-down to css_released(), which
> guarantees nobody (including no zombies) is still using our cgroup.
>
> Furthermore, a few simple cleanups are possible too. There doesn't
> appear to be any point to us using css_online() (anymore?) so fold that
> in css_alloc().
>
> And since cgroup code guarantees an RCU grace period between
> css_released() and css_free() we can forgo using call_rcu() and free the
> stuff immediately.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups")
> Suggested-by: Tejun Heo <tj@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Survived 500 reboots. Without the patch, I've never gone past 84 reboots.
Tested-by: Niklas Cassel <niklas.cassel@xxxxxxxx>