Re: [PATCH] pid: allow pidfds for reaped tasks

From: Alexander Mikhalitsyn
Date: Mon Aug 07 2023 - 05:02:15 EST


On Mon, Aug 7, 2023 at 10:52 AM David Rheinsberg <david@xxxxxxxxxxxx> wrote:
>
> A pidfd can currently only be created for tasks that are thread-group
> leaders and not reaped. This patch changes the pidfd-core to allow for
> pidfds on reapead thread-group leaders as well.
>
> A pidfd can outlive the task it refers to, and thus user-space must
> already be prepared that the task underlying a pidfd is gone at the time
> they get their hands on the pidfd. For instance, resolving the pidfd to
> a PID via the fdinfo must be prepared to read `-1`.
>
> Despite user-space knowing that a pidfd might be stale, several kernel
> APIs currently add another layer that checks for this. In particular,
> SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
> but returns a stale pidfd if the task is reaped immediately after the
> respective alive-check.
>
> This has the unfortunate effect that user-space now has two ways to
> check for the exact same scenario: A syscall might return
> EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
> particular reason to distinguish both cases. This also propagates
> through user-space APIs, which pass on pidfds. They must be prepared to
> pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
> stale pidfd from the kernel.
>
> This patch changes the core pidfd helpers to allow creation of pidfds
> even if the PID is no longer linked to any task. This only affects one
> of the three pidfd users that currently exist:
>
> 1) fanotify already tests for a linked TGID-task manually before
> creating the PIDFD, thus it is not directly affected by this change.
> However, note that the current fanotify code fails with an error if
> the target process is reaped exactly between the TGID-check in
> fanotify and the test in pidfd_prepare(). With this patch, this
> will no longer be the case.
>
> 2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
> it is also not directly affected by this change.
> Again, similar to fanotify, there is a race between the
> find_get_pid() call and pidfd_prepare(), which currently causes
> pidfd_open(2) to return EINVAL rather than ESRCH if the process is
> reaped just between those two checks. With this patch, this will no
> longer be the case.
>
> 3) SO_PEERPIDFD will be affected by this change and from now on return
> stale pidfds rather than EINVAL if the respective peer task is
> reaped already.
>
> Given that users of SO_PEERPIDFD must already deal with stale pidfds,
> this change hopefully simplifies the API of SO_PEERPIDFD, and all
> dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
> driver APIs). Also note that SO_PEERPIDFD is still pending to be
> released with linux-6.5.
>
> Signed-off-by: David Rheinsberg <david@xxxxxxxxxxxx>
> ---
> kernel/fork.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d2e12b6d2b18..4dde19a8c264 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
> * Allocate a new file that stashes @pid and reserve a new pidfd number in the
> * caller's file descriptor table. The pidfd is reserved but not installed yet.
> *
> - * The helper verifies that @pid is used as a thread group leader.
> + * The helper verifies that @pid is/was used as a thread group leader.
> *
> * If this function returns successfully the caller is responsible to either
> * call fd_install() passing the returned pidfd and pidfd file as arguments in
> @@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
> */
> int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> {
> - if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> + if (!pid)
> + return -EINVAL;
> +
> + /*
> + * Non thread-group leaders cannot have pidfds, but we allow them for
> + * reaped thread-group leaders.
> + */
> + if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
> return -EINVAL;

Hi David!

As far as I understand, __unhash_process is always called with a
tasklist_lock held for writing.
Don't we need to take tasklist_lock for reading here to guarantee
consistency between
pid_has_task(pid, PIDTYPE_PID) and pid_has_task(pid, PIDTYPE_TGID)
return values?

Kind regards,
Alex

>
> return __pidfd_prepare(pid, flags, ret);
> --
> 2.41.0
>