Re: [PATCH v2] kernel: release ptraced tasks before zap_pid_ns_processes

From: Jiri Slaby
Date: Tue Feb 26 2019 - 04:19:09 EST


On 10. 01. 19, 18:52, Andrei Vagin wrote:
> Currently, exit_ptrace() adds all ptraced tasks in a dead list, than
> zap_pid_ns_processes() waits all tasks in a current pidns, and only
> then tasks from the dead list are released.
>
> zap_pid_ns_processes() can stuck on waiting tasks from the dead list. In
> this case, we will have one unkillable process with one or more dead
> children.
>
> Thanks to Oleg for the advice to release tasks in find_child_reaper().
>
> Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
>
> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxx>
> ---
>
> v2: Oleg showed that ptraced tasks can be released in
> find_child_reaper(). This allows to avoid additional
> write_lock/unlock(tasklist), and another list_for_each_entry_safe(dead)
> loop is called only if it is actually needed.
>
> kernel/exit.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 2d14979577ee..5df787a497f5 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -558,12 +558,14 @@ static struct task_struct *find_alive_thread(struct task_struct *p)
> return NULL;
> }
>
> -static struct task_struct *find_child_reaper(struct task_struct *father)
> +static struct task_struct *find_child_reaper(struct task_struct *father,
> + struct list_head *dead)
> __releases(&tasklist_lock)
> __acquires(&tasklist_lock)
> {
> struct pid_namespace *pid_ns = task_active_pid_ns(father);
> struct task_struct *reaper = pid_ns->child_reaper;
> + struct task_struct *p, *n;
>
> if (likely(reaper != father))
> return reaper;
> @@ -579,6 +581,12 @@ static struct task_struct *find_child_reaper(struct task_struct *father)
> panic("Attempted to kill init! exitcode=0x%08x\n",
> father->signal->group_exit_code ?: father->exit_code);
> }
> +
> + list_for_each_entry_safe(p, n, dead, ptrace_entry) {
> + list_del_init(&p->ptrace_entry);
> + release_task(p);
> + }
> +

Hi,

from our (SUSE) QA we received a report that this patch causes a
performance decline in libmicro pthread_* benchmark as reported in:
https://bugzilla.suse.com/show_bug.cgi?id=1126762

I tried myself from the repo:
https://github.com/redhat-performance/libMicro

I ran
pthread_create -B 8 -C 200 -S

and with the patch applied:
# STATISTICS usecs/call (raw) usecs/call (outliers removed)
# mean 23.38611 17.29311

Without:
# mean 41.36539 39.21347

The values vary, but they are around 23 and 42, respectively.

The benchmark seems to create 8 (-B above) pthreads, does lock/unlock in
them and then the threads exit. The benchmark reaps the threads via
pthread_join. This all happens 200 times (-C above).

Any idea how to restore the performance close to the previous state?

> zap_pid_ns_processes(pid_ns);
> write_lock_irq(&tasklist_lock);
>
> @@ -668,7 +676,7 @@ static void forget_original_parent(struct task_struct *father,
> exit_ptrace(father, dead);
>
> /* Can drop and reacquire tasklist_lock */
> - reaper = find_child_reaper(father);
> + reaper = find_child_reaper(father, dead);
> if (list_empty(&father->children))
> return;

thanks,
--
js
suse labs