Re: [PATCH v2 0/5] pid: add pidfd_open()

From: Jonathan Kowalski
Date: Mon Apr 01 2019 - 06:03:32 EST


On Mon, Apr 1, 2019 at 1:53 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Mon, Apr 1, 2019 at 12:33 AM Christian Brauner <christian@xxxxxxxxxx> wrote:
> > On Sun, Mar 31, 2019 at 03:16:47PM -0700, Linus Torvalds wrote:
> > > On Sun, Mar 31, 2019 at 3:03 PM Christian Brauner <christian@xxxxxxxxxx> wrote:
> > > > Thanks for the input. The problem Jann and I saw with this is that it
> > > > would be awkward to have the kernel open a file in some procfs instance,
> > > > since then userspace would have to specify which procfs instance the fd
> > > > should come from.
> > >
> > > I would actually suggest we just make the rules be that the
> > > pidfd_open() always return the internal /proc entry regardless of any
> > > mount-point (or any "hidepid") but also suggest that exactly *because*
> > > it gives you visibility into the target pid, you'd basically require
> > > the strictest kind of control of the process you're trying to get the
> > > pidfd of.
> > >
> > > Ie likely something along the lines of
> > >
> > > ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS)

This then restricts the usage of the API under YAMA etc to processes
which have CAP_SYS_PTRACE or are parents wanting to manage their
children (which has worked fine for all these years anyway).

If they were just stable file descriptors referring to the process,
none of it would be a problem. You would just need normal permissions
when signalling using the pidfd (and depending on if you have CAP_KILL
in the owning userns, you could send any signal to it), ptrace
privileges when you use the pidfd with ptrace itself (suppose we
extend it to take a pidfd in the future, and it has a well established
model), so there is some separation of responsibilities. This is more
useful in general for userspace IMO.

All of the complication comes from the fact that we're trying to bind
a pid reference to also its /proc directory, and there's now another
way to get to that apart from the mount namespace, when there is
already a race free to do so yourself.

> >
> > I can live with that but I would like to hear what Jann thinks too if
> > that's ok.
>
> Ah, yes. That seems reasonable. And, as Linus said, pidfd_open() is
> less important if you can just do open("/proc/...") on systems with
> procfs instead.
>
> One minor detail to keep in mind for the future is that in a
> straightforward implementation of this concept, if a non-capable
> process is running in a mount namespace, but in the initial network
> namespace, without any reachable /proc mount, it will be able to look
> at information about other processes' network connections by first
> using pidfd_open() on itself or by using clone(CLONE_PIDFD), then
> looking at the "net" directory under the resulting file descriptor.