Re: [RFC PATCH] Minimal non-child process exit notification support

From: Daniel Colascione
Date: Mon Oct 29 2018 - 16:01:31 EST


Thanks for taking a look.

On Mon, Oct 29, 2018 at 7:45 PM, Joel Fernandes <joelaf@xxxxxxxxxx> wrote:
>
> On Mon, Oct 29, 2018 at 10:53 AM Daniel Colascione <dancol@xxxxxxxxxx> wrote:
> >
> > This patch adds a new file under /proc/pid, /proc/pid/exithand.
> > Attempting to read from an exithand file will block until the
> > corresponding process exits, at which point the read will successfully
> > complete with EOF. The file descriptor supports both blocking
> > operations and poll(2). It's intended to be a minimal interface for
> > allowing a program to wait for the exit of a process that is not one
> > of its children.
> >
> > Why might we want this interface? Android's lmkd kills processes in
> > order to free memory in response to various memory pressure
> > signals. It's desirable to wait until a killed process actually exits
> > before moving on (if needed) to killing the next process. Since the
> > processes that lmkd kills are not lmkd's children, lmkd currently
> > lacks a way to wait for a proces to actually die after being sent
> > SIGKILL; today, lmkd resorts to polling the proc filesystem pid
>
> Any idea why it needs to wait and then send SIGKILL? Why not do
> SIGKILL and look for errno == ESRCH in a loop with a delay.

I want to get polling loops out of the system. Polling loops are bad
for wakeup attribution, bad for power, bad for priority inheritance,
and bad for latency. There's no right answer to the question "How long
should I wait before checking $CONDITION again?". If we can have an
explicit waitqueue interface to something, we should. Besides, PID
polling is vulnerable to PID reuse, whereas this mechanism (just like
anything based on struct pid) is immune to it.

> > entry. This interface allow lmkd to give up polling and instead block
> > and wait for process death.
>
> Can we use ptrace(2) for the exit notifications? I am assuming you
> already though about it but I'm curious what is the reason this is
> better.

Only one process can ptrace a given process at a time, so I don't like
ptrace as a mechanism for anything except debugging.

Relying on ptrace for exit notification would interfere with things
like debuggers and crash dump collection systems. Besides, ptrace can
do too much (like read and write process memory) and so requires very
strong privileges not necessary for this mechanism. Besides: ptrace's
interface is complicated and relies on repeated calls to various wait
functions, whereas the interface in this patch is simple enough to use
from the shell.