Re: Threads stuck in zap_pid_ns_processes()

From: Eric W. Biederman
Date: Fri May 12 2017 - 09:33:02 EST


Vovo Yang <vovoy@xxxxxxxxxx> writes:

> On Fri, May 12, 2017 at 7:19 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Guenter Roeck <linux@xxxxxxxxxxxx> writes:
>>
>>> What I know so far is
>>> - We see this condition on a regular basis in the field. Regular is
>>> relative, of course - let's say maybe 1 in a Milion Chromebooks
>>> per day reports a crash because of it. That is not that many,
>>> but it adds up.
>>> - We are able to reproduce the problem with a performance benchmark
>>> which opens 100 chrome tabs. While that is a lot, it should not
>>> result in a kernel hang/crash.
>>> - Vovo proviced the test code last night. I don't know if this is
>>> exactly what is observed in the benchmark, or how it relates to the
>>> benchmark in the first place, but it is the first time we are actually
>>> able to reliably create a condition where the problem is seen.
>>
>> Thank you. I will be interesting to hear what is happening in the
>> chrome perfomance benchmark that triggers this.
>>
> What's happening in the benchmark:
> 1. A chrome renderer process was created with CLONE_NEWPID
> 2. The process crashed
> 3. Chrome breakpad service calls ptrace(PTRACE_ATTACH, ..) to attach to every
> threads of the crashed process to dump info
> 4. When breakpad detach the crashed process, the crashed process stuck in
> zap_pid_ns_processes()

Very interesting thank you.

So the question is specifically which interaction is causing this.

In the test case provided it was a sibling task in the pid namespace
dying and not being reaped. Which may be what is happening with
breakpad. So far I have yet to see kernel bug but I won't rule one out.

Eric