Re: [PATCH v9] exec: Fix dead-lock in de_thread with ptrace_attach

From: Bernd Edlinger
Date: Wed Jun 16 2021 - 17:31:33 EST


On 6/15/21 4:26 PM, Bernd Edlinger wrote:
> Thanks for your review.
>
> On 6/14/21 6:42 PM, Eric W. Biederman wrote:
>> Bernd Edlinger <bernd.edlinger@xxxxxxxxxx> writes:
>>
>>> This introduces signal->unsafe_execve_in_progress,
>>> which is used to fix the case when at least one of the
>>> sibling threads is traced, and therefore the trace
>>> process may dead-lock in ptrace_attach, but de_thread
>>> will need to wait for the tracer to continue execution.
>>
>> Userspace processes hang waiting for each other. Not a proper kernel
>> deadlock. Annoying but not horrible. Definitely worth fixing if we can.
>>
>
> I wonder if I am used a wrong term in the title.
> Do you have a suggestion for better wording?
>
>>> The solution is to detect this situation and allow
>>> ptrace_attach to continue, while de_thread() is still
>>> waiting for traced zombies to be eventually released.
>>> When the current thread changed the ptrace status from
>>> non-traced to traced, we can simply abort the whole
>>> execve and restart it by returning -ERESTARTSYS.
>>> This needs to be done before changing the thread leader,
>>> because the PTRACE_EVENT_EXEC needs to know the old
>>> thread pid.
>>
>> Except you are not detecting this situation. Testing for t->ptrace
>> finds tasks that have completed their ptrace attach and no longer need
>> the cred_gaurd_mutex.
>>
>
> The first phase of de_thread needs co-operation from a user task,
> if and only if any task t except the thread leader has t->ptrace.
> Taking tasks from RUNNING->EXIT_ZOMBIE only needs co-operation from kernel code,


Aehm, sorry, that is not correct, what I said here.

I totally overlooked ptrace(PTRACE_SEIZE, pid, 0L, PTRACE_O_TRACEEXIT)

and unfortunately this also prevents even the thread leader to enter the
EXIT_ZOMBIE state because do_exit does:

ptrace_event(PTRACE_EVENT_EXIT, code);

unfortunately this sends an event to the tracer, and waits not only for
the tracer to call waitpid, but also needs a PTRACE_CONT before do_exit
can call exit_notify which does tsk->exit_state = EXIT_ZOMBIE.

So unfortunately this breaks my patch, so I have to withdraw it for now,
since I see no way how to fix it.

I will clean-up my previous patch which changes the ptrace API to return
an error if an unsafe execve is detected, and send it to this list.


Thanks
Bernd.