RE: [RFC PATCH] Minimal non-child process exit notification support

From: David Laight
Date: Wed Oct 31 2018 - 10:25:25 EST


From: Daniel Colascione
> Sent: 31 October 2018 12:56
> On Wed, Oct 31, 2018 at 12:27 PM, David Laight <David.Laight@xxxxxxxxxx> wrote:
> > From: Daniel Colascione
> >> Sent: 29 October 2018 17:53
> >>
> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
> >> Attempting to read from an exithand file will block until the
> >> corresponding process exits, at which point the read will successfully
> >> complete with EOF. The file descriptor supports both blocking
> >> operations and poll(2). It's intended to be a minimal interface for
> >> allowing a program to wait for the exit of a process that is not one
> >> of its children.
> >
> > Why do you need an extra file?
>
> Because no current file suffices.

That doesn't stop you making something work on any/all of the existing files.

> > It ought to be possible to use poll() to wait for POLLERR having set
> > 'events' to zero on any of the nodes in /proc/pid - or even on
> > the directory itself.
>
> That doesn't actually work today. And waiting on a directory with
> POLLERR would be very weird, since directories in general don't do
> things like blocking reads or poll support. A separate file with
> self-contained, well-defined semantics is cleaner.

Device drivers will (well ought to) return POLLERR when a device
is removed.
Making procfs behave the same way wouldn't be too stupid.

> > Indeed, to avoid killing the wrong process you need to have opened
> > some node of /proc/pid/* (maybe cmdline) before sending the kill
> > signal.
>
> The kernel really needs better documentation of the semantics of
> procfs file descriptors. You're not the only person to think,
> mistakenly, that keeping a reference to a /proc/$PID/something FD
> reserves $PID and prevents it being used for another process. Procfs
> FDs do no such thing. kill(2) is unsafe whether or not
> /proc/pid/cmdline or any other /proc file is open.

Interesting.
Linux 'fixed' the problem of pid reuse in the kernel by adding (IIRC)
'struct pid' that reference counts the pid stopping reuse.
But since the pids are still allocated sequentially userspace can
still reference a pid that is freed and immediately reused.
I'd have thought that procfs nodes held a reference count on the 'struct pid'.
There's probably no reason why it shouldn't.

Annoyingly non-GPL drivers can't release references to 'struct pid' so
are very constrained about which processes they can signal.
I also managed to use a stale 'struct pid' and kill the wrong process
- much more likely that the pid number being reused.

If you look at the NetBSD pid allocator you'll see that it uses the
low pid bits to index an array and the high bits as a sequence number.
The array slots are also reused LIFO, so you always need a significant
number of pid allocate/free before a number is reused.
The non-sequential allocation also makes it significantly more difficult
to predict when a pid will be reused.
The table size is doubled when it gets nearly full.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)