[PATCH] pid: allow pidfds for reaped tasks

From: David Rheinsberg
Date: Mon Aug 07 2023 - 04:52:39 EST


A pidfd can currently only be created for tasks that are thread-group
leaders and not reaped. This patch changes the pidfd-core to allow for
pidfds on reapead thread-group leaders as well.

A pidfd can outlive the task it refers to, and thus user-space must
already be prepared that the task underlying a pidfd is gone at the time
they get their hands on the pidfd. For instance, resolving the pidfd to
a PID via the fdinfo must be prepared to read `-1`.

Despite user-space knowing that a pidfd might be stale, several kernel
APIs currently add another layer that checks for this. In particular,
SO_PEERPIDFD returns `EINVAL` if the peer-task was already reaped,
but returns a stale pidfd if the task is reaped immediately after the
respective alive-check.

This has the unfortunate effect that user-space now has two ways to
check for the exact same scenario: A syscall might return
EINVAL/ESRCH/... *or* the pidfd might be stale, even though there is no
particular reason to distinguish both cases. This also propagates
through user-space APIs, which pass on pidfds. They must be prepared to
pass on `-1` *or* the pidfd, because there is no guaranteed way to get a
stale pidfd from the kernel.

This patch changes the core pidfd helpers to allow creation of pidfds
even if the PID is no longer linked to any task. This only affects one
of the three pidfd users that currently exist:

1) fanotify already tests for a linked TGID-task manually before
creating the PIDFD, thus it is not directly affected by this change.
However, note that the current fanotify code fails with an error if
the target process is reaped exactly between the TGID-check in
fanotify and the test in pidfd_prepare(). With this patch, this
will no longer be the case.

2) pidfd_open(2) calls find_get_pid() before creating the pidfd, thus
it is also not directly affected by this change.
Again, similar to fanotify, there is a race between the
find_get_pid() call and pidfd_prepare(), which currently causes
pidfd_open(2) to return EINVAL rather than ESRCH if the process is
reaped just between those two checks. With this patch, this will no
longer be the case.

3) SO_PEERPIDFD will be affected by this change and from now on return
stale pidfds rather than EINVAL if the respective peer task is
reaped already.

Given that users of SO_PEERPIDFD must already deal with stale pidfds,
this change hopefully simplifies the API of SO_PEERPIDFD, and all
dependent user-space APIs (e.g., GetConnectionCredentials() on D-Bus
driver APIs). Also note that SO_PEERPIDFD is still pending to be
released with linux-6.5.

Signed-off-by: David Rheinsberg <david@xxxxxxxxxxxx>
---
kernel/fork.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d2e12b6d2b18..4dde19a8c264 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,7 +2161,7 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
* Allocate a new file that stashes @pid and reserve a new pidfd number in the
* caller's file descriptor table. The pidfd is reserved but not installed yet.
*
- * The helper verifies that @pid is used as a thread group leader.
+ * The helper verifies that @pid is/was used as a thread group leader.
*
* If this function returns successfully the caller is responsible to either
* call fd_install() passing the returned pidfd and pidfd file as arguments in
@@ -2180,7 +2180,14 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+ if (!pid)
+ return -EINVAL;
+
+ /*
+ * Non thread-group leaders cannot have pidfds, but we allow them for
+ * reaped thread-group leaders.
+ */
+ if (pid_has_task(pid, PIDTYPE_PID) && !pid_has_task(pid, PIDTYPE_TGID))
return -EINVAL;

return __pidfd_prepare(pid, flags, ret);
--
2.41.0