Re: [PATCH] cgroup: fix race between fork and cgroup.kill
From: Tejun Heo
Date: Sun Feb 02 2025 - 11:56:31 EST
On Thu, Jan 30, 2025 at 04:05:42PM -0800, Shakeel Butt wrote:
> Tejun reported the following race between fork() and cgroup.kill at [1].
>
> Tejun:
> I was looking at cgroup.kill implementation and wondering whether there
> could be a race window. So, __cgroup_kill() does the following:
>
> k1. Set CGRP_KILL.
> k2. Iterate tasks and deliver SIGKILL.
> k3. Clear CGRP_KILL.
>
> The copy_process() does the following:
>
> c1. Copy a bunch of stuff.
> c2. Grab siglock.
> c3. Check fatal_signal_pending().
> c4. Commit to forking.
> c5. Release siglock.
> c6. Call cgroup_post_fork() which puts the task on the css_set and tests
> CGRP_KILL.
>
> The intention seems to be that either a forking task gets SIGKILL and
> terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I
> don't see what guarantees that k3 can't happen before c6. ie. After a
> forking task passes c5, k2 can take place and then before the forking task
> reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child.
> What am I missing?
>
> This is indeed a race. One way to fix this race is by taking
> cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the fork()
> side takes cgroup_threadgroup_rwsem in read mode from cgroup_can_fork()
> to cgroup_post_fork(). However that would be heavy handed as this adds
> one more potential stall scenario for cgroup.kill which is usually
> called under extreme situation like memory pressure.
>
> To fix this race, let's maintain a sequence number per cgroup which gets
> incremented on __cgroup_kill() call. On the fork() side, the
> cgroup_can_fork() will cache the sequence number locally and recheck it
> against the cgroup's sequence number at cgroup_post_fork() site. If the
> sequence numbers mismatch, it means __cgroup_kill() can been called and
> we should send SIGKILL to the newly created task.
>
> Reported-by: Tejun Heo <tj@xxxxxxxxxx>
> Closes: https://lore.kernel.org/all/Z5QHE2Qn-QZ6M-KW@xxxxxxxxxxxxxxx/ [1]
> Fixes: 661ee6280931 ("cgroup: introduce cgroup.kill")
> Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
Applied to cgroup/for-6.14-fixes.
Thanks.
--
tejun