Re: [GIT PULL] pidfd fixes

From: Linus Torvalds
Date: Mon Jul 22 2019 - 12:27:30 EST


On Mon, Jul 22, 2019 at 7:26 AM Christian Brauner <christian@xxxxxxxxxx> wrote:
>
> This contains a fix for pidfd polling. It ensures that the task's exit
> state is visible to all waiters:

Hmm.

I've pulled this, but the exit_state thing has been very fragile
before, and I'm not entirely happy with how this just changes where it
is set. I guess the movement here is all inside the tasklist_lock, so
it's not that big of a deal, but still..

I would *really* like Oleg to take a look.

Also, and the primary reason I write this email is that this basically
makes the "EXIT_ZOMBIE / EXIT_DEAD" state handling look all kinds of
crazy. You set it to EXIT_ZOMBIE potentially _twice_. Whaa?

So if we set EXIT_ZOMBIE early, then I think we should change the
EXIT_DEAD case too. IOW, do something like this on top:

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -734,9 +734,10 @@ static void exit_notify(struct task_struct
*tsk, int group_dead)
autoreap = true;
}

- tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
- if (tsk->exit_state == EXIT_DEAD)
+ if (autoreap) {
+ tsk->exit_state = EXIT_DEAD;
list_add(&tsk->ptrace_entry, &dead);
+ }

/* mt-exec, de_thread() is waiting for group leader */
if (unlikely(tsk->signal->notify_count < 0))

where now the logic becomes "ok, we turned into a zombie above, and if
we autoreap this thread then we turn the zombie into a fully dead
thread".

Because currently we end up having "first turn it into a zombie", then
"set it to zombie or dead depending on autoreap" and then "if we
turned it into dead, move it to the dead list".

That just feels confused to me. Comments?

Linus