Re: [PATCH v2 0/5] pid: add pidfd_open()

From: Daniel Colascione
Date: Mon Apr 01 2019 - 11:55:15 EST


On Mon, Apr 1, 2019 at 8:36 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Apr 1, 2019 at 4:41 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> >
> > Eric pitched a procfs2 which would *just* be the PIDs some time ago (in
> > an attempt to make it possible one day to mount /proc inside a container
> > without adding a bunch of masked paths), though it was just an idea and
> > I don't know if he ever had a patch for it.

Couldn't this mode just be a relatively simple procfs mount option
instead of a whole new filesystem? It'd be a bit like hidepid, right?
The internal bind mount option and the no-dotdot-traversal options
also look good to me.

> I wonder if we really want a fill procfs2, or maybe we could just make
> the pidfd readable (yes, it's a directory file descriptor, but we
> could allow reading).

What would read(2) read?

> What are the *actual* use cases for opening /proc files through it? If
> it's really just for a small subset that android wants to do this
> (getting basic process state like "running" etc), rather than anything
> else, then we could skip the whole /proc linking entirely and go the
> other way instead (ie open_pidfd() would get that limited IO model,
> and we could make the /proc directory node get the same limited IO
> model).

We do a lot of process state inspection and manipulation, including
reading and writing the oom killer adjustment score, reading smaps,
and the occasional cgroup manipulation. More generally, I'd also like
to be able to write a race-free pkill(1). Doing this work via pidfd
would be convenient. More generally, we can't enumerate the specific
use cases, because what we want to do with processes isn't bounded in
advance, and we regularly find new things in /proc/pid that we want to
read and write. I'd rather not prematurely limit the applicability of
the pidfd interface, especially when there's a simple option (the
procfs directory file descriptor approach) that doesn't require
in-advance enumeration of supported process inspection and
manipulation actions or a separate per-option pidfd equivalent. I very
much want a general-purpose API that reuses the metadata interfaces
the kernel already exposes. It's not clear to me how this rich
interface could be matched by read(2) on a pidfd.