Re: [PATCH 0/2] fix the traced mt-exec deadlock
From: Oleg Nesterov
Date: Fri Mar 03 2017 - 12:33:41 EST
On 03/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>
> > our discussion was a bit confusing, and it seems that we did not
> > fully convince each other. So let me ask what do you finally think
> > about this fix.
> >
> > Let me repeat. Even if I do not agree with some of your objections,
> > I do agree that 1/2 does not look nice and clean. And we seem to
> > agree that either way, with or without this fix, we need more changes
> > in this area.
> >
> > But we need a simple and backportable fix for stable trees, say for
> > rhel7. This bug was reported many times, and this is the simplest
> > solution I was able to find.
>
> I am not 100% convinced that we need a backportable solution,
I think we need, this was already requested,
> but
> regardless my experience is that good clean solutions are almost always
> easier to backport that something we just settle for.
Sure.
> The patch below needs a little more looking and testing but arguably
> it is sufficient.
>
> It implements autoreaping for non-leader threads always. We might want
> to limit this to the case of exec.
I should have mentioned this. Of course, this change was proposed from the
very beginning, when this problem was reported first time. And of course
I like this fix much more than my patch (but yes, we shouldd limit it to
the case of exec).
The only problem is that it is incompatible change, and I have no idea what
can be broken.
Trivial test-case:
void *thread(void *arg)
{
for (;;)
pause();
return NULL;
}
int main(void)
{
pthread_t pt;
pthread_create(&pt, NULL, thread, NULL);
execlp("true", "true", NULL);
}
with your patch applied
$ strace -f ./test
strace: wait4(__WALL): No child processes
Yes, this is just a warning, but still. Now we need to change strace. And we
can't know what else can be confused/broken by this change.
man(ptrace) even documents that all other threads except the thread group leader
report death as if they exited via _exit(2).
Yes, yes, it also says that other threads stop in PTRACE_EVENT_EXIT stop,
so 2/2 (which we need with your change too) is is not compatible too and
I am worried, but:
- this was never really true, an already exiting thread won't stop
if it races with exec
- PTRACE_O_TRACEEXEC is not used that often, it never really worked
- man(ptrace) also mentions that PTRACE_EVENT_EXIT behaviour may be
change in the future.
In short. Of course I considered this change. Looks too risky to me. But.
I will be happy if you send this change and take all the blame ;) I won't
argue (if you limit it to execve case).
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> thread_group_empty(tsk) &&
> !ptrace_reparented(tsk) ?
> tsk->exit_signal : SIGCHLD;
> - autoreap = do_notify_parent(tsk, sig);
> + do_notify_parent(tsk, sig);
> + /* Autoreap threads even when ptraced */
> + autoreap = !thread_group_leader(tsk);
> } else if (thread_group_leader(tsk)) {
> autoreap = thread_group_empty(tsk) &&
> do_notify_parent(tsk, tsk->exit_signal);
This is all you need,
> @@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> }
>
> tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
> - if (tsk->exit_state == EXIT_DEAD)
> - list_add(&tsk->ptrace_entry, &dead);
>
> /* mt-exec, de_thread() is waiting for group leader */
> if (unlikely(tsk->signal->notify_count < 0))
> @@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> list_del_init(&p->ptrace_entry);
> release_task(p);
> }
> + if (autoreap)
> + release_task(tsk);
These 2 changes are not needed. release_task(tsk) will be called by
list_for_each_entry_safe() above if autoreap == T.
Oleg.