Re: [PATCH v9] pidfd: add ioctl to retrieve pid info
From: Christian Brauner
Date: Thu Oct 10 2024 - 05:38:04 EST
On Thu, Oct 10, 2024 at 07:56:53AM +1100, Aleksa Sarai wrote:
> On 2024-10-09, Jonathan Corbet <corbet@xxxxxxx> wrote:
> > luca.boccassi@xxxxxxxxx writes:
> >
> > > As discussed at LPC24, add an ioctl with an extensible struct
> > > so that more parameters can be added later if needed. Start with
> > > returning pid/tgid/ppid and creds unconditionally, and cgroupid
> > > optionally.
> >
> > I was looking this over, and a couple of questions came to mind...
> >
> > > Signed-off-by: Luca Boccassi <luca.boccassi@xxxxxxxxx>
> > > ---
> >
> > [...]
> >
> > > diff --git a/fs/pidfs.c b/fs/pidfs.c
> > > index 80675b6bf884..15cdc7fe4968 100644
> > > --- a/fs/pidfs.c
> > > +++ b/fs/pidfs.c
> > > @@ -2,6 +2,7 @@
> > > #include <linux/anon_inodes.h>
> > > #include <linux/file.h>
> > > #include <linux/fs.h>
> > > +#include <linux/cgroup.h>
> > > #include <linux/magic.h>
> > > #include <linux/mount.h>
> > > #include <linux/pid.h>
> > > @@ -114,6 +115,83 @@ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
> > > return poll_flags;
> > > }
> > >
> > > +static long pidfd_info(struct task_struct *task, unsigned int cmd, unsigned long arg)
> > > +{
> > > + struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
> > > + size_t usize = _IOC_SIZE(cmd);
> > > + struct pidfd_info kinfo = {};
> > > + struct user_namespace *user_ns;
> > > + const struct cred *c;
> > > + __u64 request_mask;
> > > +
> > > + if (!uinfo)
> > > + return -EINVAL;
> > > + if (usize < sizeof(struct pidfd_info))
> > > + return -EINVAL; /* First version, no smaller struct possible */
> > > +
> > > + if (copy_from_user(&request_mask, &uinfo->request_mask, sizeof(request_mask)))
> > > + return -EFAULT;
> >
> > You don't check request_mask for unrecognized flags, so user space will
> > not get an error if it puts random gunk there. That, in turn, can make
> > it harder to add new options in the future.
>
> In fairness, this is how statx works and statx does this to not require
> syscall retries to figure out what flags the current kernel supports and
> instead defers that to stx_mask.
pidfd_info overwrites the request_mask with what is supported by the
kernel. I don't think userspace setting random stuff in the request_mask
is a problem. It would already be a problem with statx() and we haven't
seen that so far.
If userspace happens to set a some random bit in the request_mask and
that bit ends up being used a few kernel releases later to e.g.,
retrieve additional information then all that happens is that userspace
would now receive information they didn't need. That's not a problem.
It is of course very different to e.g. adding a random bit in the flag
mask of clone3() or mount_setattr() or any system call that changes
kernel state based on the passed bits. In that case ignoring unknown
bits and then starting to use them is obviously a big problem.
The other related problem would be flag deprecation and reuse of a flag
which (CLONE_DETACHED -> CLONE_PIDFD) also is only a real problem for
system calls that alter kernel state.
So overally, I think ignoring uknown bits in the request mask is safe.
It needs to be documented of course.