Re: [PATCH 2/4] pid: add pidfd_open()

From: Daniel Colascione
Date: Sat Mar 30 2019 - 03:39:30 EST


On Fri, Mar 29, 2019 at 11:25 PM Jonathan Kowalski <bl0pbl33p@xxxxxxxxx> wrote:
>
> On Sat, Mar 30, 2019 at 5:35 AM Daniel Colascione <dancol@xxxxxxxxxx> wrote:
> >
> > On Thu, Mar 28, 2019 at 3:38 AM Christian Brauner <christian@xxxxxxxxxx> wrote:
> > >
> > > > All that said, thanks for the work on this once again. My intention is
> > > > just that we don't end up with an API that could have been done better
> > > > and be cleaner to use for potential users in the coming years.
> > >
> > > Thanks for your input on all of this. I still don't find multiplexers in
> > > the style of seccomp()/fsconfig()/keyctl() to be a problem since they
> > > deal with a specific task. They are very much different from ioctl()s in
> > > that regard. But since Joel, you, and Daniel found the pidctl() approach
> > > not very nice I dropped it. The interface needs to be satisfactory for
> > > all of us especially since Android and other system managers will be the
> > > main consumers.
> >
> > Thanks.
> >
> > > So let's split this into pidfd_open(pid_t pid, unsigned int flags) which
> > > allows to cleanly get pidfds independent procfs and do the translation
> > > to procpidfds in an ioctl() as we've discussed in prior threads. This
> >
> > I sustain my objection to adding an ioctl. Compared to a system call,
> > an ioctl has a more rigid interface, greater susceptibility to
> > programmer error (due to the same ioctl control code potentially doing
> > different things for different file types), longer path length, and
> > more awkward filtering/monitoring/auditing/tracing. We've discussed
> > this issue at length before, and I thought we all agreed to use system
> > calls, not ioctl, for core kernel functionality. So why is an ioctl
> > suddenly back on the table? The way I see it, an ioctl has no
> > advantages except for 1) conserving system call numbers, which are not
> > scarce, and 2) avoiding the system call number coordination problem
> > (and the coordination problem isn't a factor for core kernel code). I
> > don't understand everyone's reluctance to add new system calls. What
> > am I missing? Why would we give up all the advantages that a system
> > call gives us?
> >
>
> I agree in general, but in this particular case a system call or an
> ioctl doesn't matter much, all it does is take the pidfd, the command,
> and /proc's dir fd.

Thanks again.

I agree that the operation we're discussing has a simple signature,
but signature flexibility isn't the only reason to prefer a system
call over an ioctl. There are other reasons for preferring system
calls to ioctls (safety, tracing, etc.) that apply even if the
operation we're discussing has a relatively simple signature: for
example, every system call has a distinct and convenient ftrace event,
but ioctls don't; strace filtering Just Works on a
system-call-by-system-call basis, but it doesn't for ioctls; and
documentation for system calls is much more discoverable (e.g., man
-k) than documentation for ioctls. Even if the distinction doesn't
matter much, IMHO, it still matters a little, enough to favor a system
call without an offsetting advantage for the ioctl option.

> If you start adding a system call for every specific operation on file
> descriptors, it *will* become a problem.

I'm not sure what you mean. Do you mean that adding a top-level system
call for every operation that might apply to one specific kind of file
descriptor would lead, as the overall result, to the kernel having
enough system calls to cause negative consequences? I'm not sure I
agree, but accepting this idea for the sake of discussion: shouldn't
we be more okay with system calls for features present on almost all
systems --- like procfs --- even if we punt to ioctls very rarely-used
functionality, e.g., some hypothetical special squeak noise that you
could get some specific 1995 adlib clone to make?

> Besides, the translation is
> just there because it is racy to do in userspace, it is not some well
> defined core kernel functionality.
> Therefore, it is just a way to
> enter the kernel to do the openat in a race free and safe manner.

I agree that the translation has to be done in the kernel, not
userspace, and that the kernel must provide to userspace some
interface for requesting that the translation happen: we're just
discussing the shape of this interface. Shouldn't all interfaces
provided by the kernel to userspace be equally well defined? I'm not
sure that the internal simplicity of the operation matters much
either. There are already explicit system calls for some
simple-to-implement things, e.g., timerfd_gettime. It's worth noting
that timerfd is (IIRC), like procfs, a feature that's both ubiquitous
and optional.

> As is, the facility being provided through an ioctl on the pidfd is
> not something I'd consider a problem.

You're right that from a signature perspective, using an ioctl isn't a
problem. I just want to make sure we take into account the other,
non-signature advantages that system calls have over ioctls.

> I think the translation stuff
> should also probably be an extension of ioctl_ns(2) (but I wouldn't be
> opposed if translate_pid is resurrected as is).
> For anything more involved than ioctl(pidfd, PIDFD_TO_PROCFD,
> procrootfd), I'd agree that a system call would be a cleaner
> interface, otherwise, if you cannot generalise it, using ioctls as a
> command interface is probably the better tradeoff here.

Sure: I just want to better understand everyone else's thought process
here, having been frustrated with things like the termios ioctls being
ioctls.