Re: [PATCH v9] pidfd: add ioctl to retrieve pid info

From: Christian Brauner
Date: Tue Oct 08 2024 - 09:24:26 EST


> > +#ifdef CONFIG_CGROUPS
> > + if (request_mask & PIDFD_INFO_CGROUPID) {
> > + struct cgroup *cgrp;
> > +
> > + guard(rcu)();
> > + cgrp = task_cgroup(task, pids_cgrp_id);
> > + if (!cgrp)
> > + return -ENODEV;
>
> Afaict this means that the task has already exited. In other words, the
> cgroup id cannot be retrieved anymore for a task that has exited but not
> been reaped. Frankly, I would have expected the cgroup id to be
> retrievable until the task has been reaped but that's another
> discussion.
>
> My point is if you contrast this with the other information in here: If
> the task has exited but hasn't been reaped then you can still get
> credentials such as *uid/*gid, and pid namespace relative information
> such as pid/tgid/ppid.

Related to this and I just want to dump this idea somewhere:

I'm aware that it is often desirable or useful to have information about
a task around even after the task has exited and been reaped.

The exit status comes to mind but maybe there's other stuff that would
be useful to have.

Since we changed to pidfs we know that all pidfds no matter if they
point to the same struct file (e.g., dup()) or to multiple struct files
(e.g., multiple pidfd_open() on the same pid) all point to the same
dentry and inode. Which is why we switched to stashing struct pid in
inode->i_private.

So we could easily do something like this:

// SKETCH SKETCH SKETCH
diff --git a/fs/pidfs.c b/fs/pidfs.c
index 7ffdc88dfb52..eeeb907f4889 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -344,9 +344,24 @@ static const struct dentry_operations pidfs_dentry_operations = {
.d_prune = stashed_dentry_prune,
};

+struct pidfd_kinfo {
+ // We could even move this back to file->private_data to avoid the
+ // additional pointer deref though I doubt it matters.
+ struct pid *pid;
+ int exit_status;
+ // other stuff;
+};
+
static int pidfs_init_inode(struct inode *inode, void *data)
{
- inode->i_private = data;
+ struct pidfd_kinfo *kinfo;
+
+ kinfo = kzalloc(sizeof(*info), GFP_KERNEL_ACCOUNT);
+ if (!kinfo)
+ return -ENOMEM;
+
+ kinfo->pid = data;
+ inode->i_private = kinfo;
inode->i_flags |= S_PRIVATE;
inode->i_mode |= S_IRWXU;
inode->i_op = &pidfs_inode_operations;

and that enables us to persist information past task exit so that as
long as you hold the pidfd you can e.g., query for the exit state of the
task or something.

I'm mostly thinking out loud but I think that could be potentially
interesting.