Re: [PATCH RFC v3 2/4] pidfd: add CLONE_PIDFD_AUTOKILL
From: Christian Brauner
Date: Wed Feb 18 2026 - 05:04:26 EST
On Wed, Feb 18, 2026 at 12:43:59AM +0100, Jann Horn wrote:
> On Tue, Feb 17, 2026 at 11:36 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
> > lifetime to the pidfd returned from clone3(). When the last reference to
> > the struct file created by clone3() is closed the kernel sends SIGKILL
> > to the child. A pidfd obtained via pidfd_open() for the same process
> > does not keep the child alive and does not trigger autokill - only the
> > specific struct file from clone3() has this property.
> >
> > This is useful for container runtimes, service managers, and sandboxed
> > subprocess execution - any scenario where the child must die if the
> > parent crashes or abandons the pidfd.
>
> Idle thought, feel free to ignore:
> In those scenarios, I guess what you'd ideally want would be a way to
> kill the entire process hierarchy, not just the one process that was
> spawned? Unless the process is anyway PID 1 of its own pid namespace.
> But that would probably be more invasive and kind of an orthogonal
> feature...
It's something that I have as an exploration item on a ToDo. :)
>
> [...]
> > +static int pidfs_file_release(struct inode *inode, struct file *file)
> > +{
> > + struct pid *pid = inode->i_private;
> > + struct task_struct *task;
> > +
> > + guard(rcu)();
> > + task = pid_task(pid, PIDTYPE_TGID);
> > + if (task && READ_ONCE(task->signal->autokill_pidfd) == file)
>
> Can you maybe also clear out the task->signal->autokill_pidfd pointer
> here? It should be fine in practice either way, but theoretically,
Yes, of course.
> with the current code, this equality check could wrongly match if the
> actual autokill file has been released and a new pidfd file has been
> reallocated at the same address... Of course, at worst that would kill
> a task that has already been killed, so it wouldn't be particularly
> bad, but still it's ugly.
>
> > + do_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_TGID);
> > +
> > + return 0;
> > +}
> [...]
> > @@ -2470,8 +2479,11 @@ __latent_entropy struct task_struct *copy_process(
> > syscall_tracepoint_update(p);
> > write_unlock_irq(&tasklist_lock);
> >
> > - if (pidfile)
> > + if (pidfile) {
> > + if (clone_flags & CLONE_PIDFD_AUTOKILL)
> > + p->signal->autokill_pidfd = pidfile;
>
> WRITE_ONCE() to match the READ_ONCE() in pidfs_file_release()?
Agreed.