Re: [PATCH v5] pidfd: add ioctl to retrieve pid info
From: Luca Boccassi
Date: Sun Oct 06 2024 - 13:53:06 EST
On Sun, 6 Oct 2024 at 18:22, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 10/06, luca.boccassi@xxxxxxxxx wrote:
> >
> > +static long pidfd_info(struct task_struct *task, unsigned int cmd, unsigned long arg)
> > +{
> > + struct pidfd_info __user *uinfo = (struct pidfd_info __user *)arg;
> > + size_t usize = _IOC_SIZE(cmd);
> > + struct pidfd_info kinfo = {};
> > + struct user_namespace *user_ns;
> > + const struct cred *c;
> > + __u64 request_mask;
> > +
> > + if (!uinfo)
> > + return -EINVAL;
> > + if (usize < sizeof(struct pidfd_info))
> > + return -EINVAL; /* First version, no smaller struct possible */
> > +
> > + if (copy_from_user(&request_mask, &uinfo->request_mask, sizeof(request_mask)))
> > + return -EFAULT;
> > +
> > + c = get_task_cred(task);
> > + if (!c)
> > + return -ESRCH;
> > +
> > + /* Unconditionally return identifiers and credentials, the rest only on request */
> > +
> > + kinfo.pid = task_pid_vnr(task);
> > + kinfo.tgid = task_tgid_vnr(task);
> > + kinfo.ppid = task_ppid_nr_ns(task, task_active_pid_ns(task));
> ^^^^^^^^^^^^^^^^^^^^^^^^
> The same problem as with "info.pid = pid_nr_ns(pid, task_active_pid_ns(task));"
> you used before. You should use the caller's namespace, not the task's namespace.
>
> kinfo.ppid = task_ppid_nr_ns(task, NULL);
>
> should work, see __task_pid_nr_ns() which uses task_active_pid_ns(current) if
> ns == NULL.
>
> > + /*
> > + * One last check before returning: the task might have exited while we
> > + * were fetching all the data, so check again to be safe. */
> > + if (task_pid_vnr(task) == 0)
> > + return -ESRCH;
>
> Well, this looks strange. It would be better to kill "kinfo.pid = task_pid_vnr()"
> above and do
>
> kinfo.pid = task_pid_vnr(task);
> if (!kinfo.pid)
> return -ESRCH;
>
> at the end, but this is minor.
>
> I don't think we can rely on this check.
>
> Suppose that pidfd_info() runs on CPU_0 and it races with __unhash_process(task)
> on CPU_1 which does
>
> detach_pid(p, PIDTYPE_PID);
> detach_pid(p, PIDTYPE_TGID);
>
> Without the barries/locking CPU_0 can see the changes above out of order,
> so it is possible that pidfd_info() will see task_pid_vnr(task) != 0, but
> report kinfo.tgid == 0.
>
> Oleg.
I see, so what should I do here then? Check both? Or none? The caller
needs to verify that the data is still valid at the point they use it
anyway, so this helps but provides no guarantees anyway