Re: [PATCH] pid: Allow creation of pidfds to threads

From: Matthew Bobrowski
Date: Wed Apr 13 2022 - 00:46:41 EST


On Fri, Apr 01, 2022 at 11:42:25AM +0200, Christian Brauner wrote:
> On Fri, Apr 01, 2022 at 11:00:27AM +0200, Alois Wohlschlager wrote:
> > Hello Christian,
> >
> > > We originally blocked this because it is not as easy as simply allowing
> > > pidfds to be created for non-thread-group leaders.
> > > For a start, pidfd_poll() currently doens't work if pidfd_task() isn't a
> > > thread-group leader
> >
> > I did notice the hang there, that's why my patch changes pidfd_poll to return
> > error on tasks which are not thread-group leaders. IIRC, waiting on specific
> > threads is not supported by Linux at all, so I don't see a problem with not
> > supporting it here either.
>
> In general, it would be quite neat if we could get notified about thread
> exit through poll though. That'd be pretty useful. But maybe it's indeed
> ok to just not support this (for now at least).
>
> I know that systemd is using pidfds in their event loop so I'd need to
> see whether they'd want support for this behavior.
>
> >
> > > and you'll just hang for CLONE_PIDFD | CLONE_THREAD.
> >
> > No, CLONE_PIDFD | CLONE_THREAD behavior is unchanged, it will still fail with
> > EINVAL. I actually confirmed this by double-checking right now.
>
> I just used the two flags as a shorthand for pidfds referring to
> threads. That might've been misleading here.
>
> >
> > > So at least that needs to be adapated as well and there's likely a bunch
> > > of other corner-cases I'm forgetting about.
> >
> > I'd be happy to hear about other corner-cases so I can fix them.
>
> I need to play with this patch a little and see what current
> expectations we do have in the code.
>
> There are various consumers of pidfds and they all have been added with
> the assumption that a pidfd refers to a thread-group leader. We should
> go through them and see whether changing them to operate on threads is
> sane before we can just switch the generic helper.
>
> Bot process_madvise() and process_mrelease() should be fine to operate
> on threads afaict from the discussion when they were added.
>
> For pidfd_send_signal() we likely want to at least consider adding the
> ability to send a thread-specific signal, i.e. supporting tgkill()
> behavior. As it stands it currently only supports kill()-like behavior
> where the signal that gets sent is thread-group directed.
>
> I roughly had originally envisioned this to be supportable through the
> addition of a new flag to pidfd_send_signal() so callers would be able
> to select whether to send a thread-specific signal or not. What do
> people think of that?

Sorry, I've been on parental leave for the last couple of months and
I'm now playing catch-up.

For the fanotify API i.e. FAN_REPORT_PIDFD, I don't see there being
any issues with supporting/returning pidfds which belong to
non-thread-group leaders. In saying that, for this to be useful from
the fanotify API POV, I definitely do think we should consider
supporting the ability to send thread-specific signals via
pidfd_send_signal(). Adding this extension through the optional flag
parameter makes sense to me.

> > > Do you have a concrete use-case you want this for?
> >
> > My use-case is basically making pidfd_getfd actually useful for its intended
> > purpose: there is a seccomp_unotify-based supervisor that wants to obtain a
> > file descriptor from its guest. This currently does not work if the action to
> > be forwarded to the supervisor is performed in a secondary thread, since there
> > is no way to obtain the required pidfd.
>
> Yeah, I'm well aware of that. I've been working around this limitation
> in our implementation for the seccomp notifier for quite a long time
> when intercepting the bpf()-syscall.

/M