Re: [PATCH RFC v3 2/4] pidfd: add CLONE_PIDFD_AUTOKILL
From: Jann Horn
Date: Tue Feb 17 2026 - 18:38:58 EST
On Wed, Feb 18, 2026 at 12:18 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 17 Feb 2026 at 14:36, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
> > lifetime to the pidfd returned from clone3(). When the last reference to
> > the struct file created by clone3() is closed the kernel sends SIGKILL
> > to the child.
>
> Did I read this right? You can now basically kill suid binaries that
> you started but don't have rights to kill any other way.
>
> If I'm right, this is completely broken. Please explain.
You can already send SIGHUP to such binaries through things like job
control, right?
Do we know if there are setuid binaries out there that change their
ruid and suid to prevent being killable via kill_ok_by_cred(), then
set SIGHUP to SIG_IGN to not be killable via job control, and then do
some work that shouldn't be interrupted?
Also, on a Linux system with systemd, I believe a normal user, when
running in the context of a user session (but not when running in the
context of a system service), can already SIGKILL anything they launch
by launching it in a systemd user service, then doing something like
"echo 1 > /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/app.slice/<servicename>.scope/cgroup.kill"
because systemd delegates cgroups for anything a user runs to that
user; and cgroup.kill goes through the codepath
cgroup_kill_write -> cgroup_kill -> __cgroup_kill -> send_sig(SIGKILL,
task, 0) -> send_sig_info -> do_send_sig_info
which, as far as I know, bypasses the normal signal sending permission
checks. (For comparison, group_send_sig_info() first calls
check_kill_permission(), then do_send_sig_info().)
I agree that this would be a change to the security model, but I'm not
sure if it would be that big a change. I guess an alternative might be
to instead gate the clone() flag on a `task_no_new_privs(current) ||
ns_capable()` check like in seccomp, but that might be too restrictive
for the usecases Christian has in mind...