Re: [PATCH RFC 3/4] pidfd: improve uapi when task isn't found
From: Christian Brauner
Date: Fri Apr 04 2025 - 09:39:32 EST
On Fri, Apr 04, 2025 at 02:37:38PM +0200, Oleg Nesterov wrote:
> On 04/03, Christian Brauner wrote:
> >
> > We currently report EINVAL whenever a struct pid has no tasked attached
> > anymore thereby conflating two concepts:
> >
> > (1) The task has already been reaped.
> > (2) The caller requested a pidfd for a thread-group leader but the pid
> > actually references a struct pid that isn't used as a thread-group
> > leader.
> >
> > This is causing issues for non-threaded workloads as in [1].
> >
> > This patch tries to allow userspace to distinguish between (1) and (2).
> > This is racy of course but that shouldn't matter.
> >
> > Link: https://github.com/systemd/systemd/pull/36982 [1]
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
>
> For this series:
>
> Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>
>
>
> But I have a couple of cosmetic nits...
>
> > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> > {
> > - bool thread = flags & PIDFD_THREAD;
> > + int err = 0;
> >
> > - if (!pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
> > - return -EINVAL;
> > + if (!(flags & PIDFD_THREAD)) {
> > + /*
> > + * If this is struct pid isn't used as a thread-group
> > + * leader pid but the caller requested to create a
> > + * thread-group leader pidfd then report ENOENT to the
> > + * caller as a hint.
> > + */
> > + if (!pid_has_task(pid, PIDTYPE_TGID))
> > + err = -ENOENT;
> > + }
> > +
> > + /*
> > + * If this wasn't a thread-group leader struct pid or the task
> > + * got reaped in the meantime report -ESRCH to userspace.
> > + *
> > + * This is racy of course. This could've not been a thread-group
> > + * leader struct pid and we set ENOENT above but in the meantime
> > + * the task got reaped. Or there was a multi-threaded-exec by a
> > + * subthread and we were a thread-group leader but now got
> > + * killed.
>
> The comment about the multi-threaded-exec looks a bit misleading to me.
> If this pid is a group-leader-pid and we race with de_thread() which does
>
> exchange_tids(tsk, leader);
> transfer_pid(leader, tsk, PIDTYPE_TGID);
>
> nothing "bad" can happen, both pid_has_task(PIDTYPE_PID) or
> pid_has_task(PIDTYPE_TGID) can't return NULL during (or after) this
> transition.
>
> hlists_swap_heads_rcu() or hlist_replace_rcu() can't make
> hlist_head->first == NULL during this transition...
Good point.
>
> Or I misunderstood the comment?
>
> And... the code looks a bit overcomplicated to me, why not simply
>
> int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> {
> if (!pid_has_task(pid, PIDTYPE_PID))
> return -ESRCH;
>
> if (!(flags & PIDFD_THREAD) && !pid_has_task(pid, PIDTYPE_TGID))
> return -ENOENT;
I thought that checking PIDTYPE_PID first could cause misleading results
where we report ENOENT where we should report ESRCH: If the task was
released after the successful PIDTYPE_PID check for a pid that was never
a thread-group leader we report ENOENT. That's what I reversed the
check. But I can adapt that to you scheme. I mostly wanted a place to
put the comments.
>
> return __pidfd_prepare(pid, flags, ret);
> }
>
> ? Of course, the comments should stay.
>
> But again, this is cosmetic/subjective, please do what you like more.
>
> Oleg.
>