Re: Maybe a race window in cgroup.kill?
From: Shakeel Butt
Date: Wed Jan 29 2025 - 14:08:59 EST
On Fri, Jan 24, 2025 at 11:33:07AM -1000, Tejun Heo wrote:
> Hello, Christian.
>
> I was looking at cgroup.kill implementation and wondering whether there
> could be a race window. So, __cgroup_kill() does the following:
>
> k1. Set CGRP_KILL.
> k2. Iterate tasks and deliver SIGKILL.
> k3. Clear CGRP_KILL.
>
> The copy_process() does the following:
>
> c1. Copy a bunch of stuff.
> c2. Grab siglock.
> c3. Check fatal_signal_pending().
> c4. Commit to forking.
> c5. Release siglock.
> c6. Call cgroup_post_fork() which puts the task on the css_set and tests
> CGRP_KILL.
>
> The intention seems to be that either a forking task gets SIGKILL and
> terminates on c3 or it sees CGRP_KILL on c6 and kills the child. However, I
> don't see what guarantees that k3 can't happen before c6. ie. After a
> forking task passes c5, k2 can take place and then before the forking task
> reaches c6, k3 can happen. Then, nobody would send SIGKILL to the child.
> What am I missing?
>
> Thanks.
I think this is indeed the race though small. One way to fix this is by
taking cgroup_threadgroup_rwsem in write mode in __cgroup_kill() as the
fork side takes it in read mode from cgroup_can_fork() to
cgroup_post_fork(). Though I think we should avoid that as this adds
one more potential stall scenario for cgroup.kill which is usually
triggered under extreme situation (memory pressure). I have prototyped a
sequence number based approach below. If that is acceptable then I can
proposed the patch with detailed commit message.