Re: [PATCH 2/4] cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
From: Tejun Heo
Date: Fri Nov 14 2025 - 13:18:14 EST
Hello,
On Fri, Nov 14, 2025 at 06:48:17PM +0100, Michal Koutný wrote:
> On Tue, Oct 28, 2025 at 08:19:16PM -1000, Tejun Heo <tj@xxxxxxxxxx> wrote:
> > An upcoming patch will defer the dying_tasks list addition, moving it from
> > cgroup_task_exit() (called from do_exit()) to a new function called from
> > finish_task_switch().
> > However, release_task() (which calls
> > cgroup_task_release()) can run either before or after finish_task_switch(),
>
> Just for better understanding -- when can release_task() run before
> finish_task_switch()?
I didn't test explicitly, so please take it with a grain of salt, but I
think both autoreap and !autoreap cases can run before the final task
switch.
- When autoreap, the dying task calls exit_notify() and eventually calls
release_task() on self. This is obviously before the final switch.
- When !autoreap, it's a race. After exit_notify(), the parent can wait the
zombie task anytime which will call release_task() through
wait_task_zombie(). This can happen either before or after
finish_task_switch().
> > creating a race where cgroup_task_release() might try to remove the task from
> > dying_tasks before or while it's being added.
> >
> > Move the list_del_init() from cgroup_task_release() to cgroup_task_free() to
> > fix this race. cgroup_task_free() runs from __put_task_struct(), which is
> > always after both paths, making the cleanup safe.
>
> (Ah, now I get the reasoning of more likely pids '0' for CSS_TASK_ITER_PROCS.)
Yeah, I thought about filtering it out better but if we can already show 0
pid for foreign ns tasks, maybe this is okay. What do you think?
Thanks.
--
tejun