Re: [PATCH 2/3] pidns: Guarantee that the pidns init will be thelast pidns process reaped.

From: Oleg Nesterov
Date: Thu May 17 2012 - 13:01:38 EST


On 05/16, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>
> > Hmm. I don't think the patch is 100% correct. Afaics, this needs more
> > delay_pidns_leader() checks.
> >
> > For example. Suppose we have a CLONE_NEWPID zombie I, it has an
> > EXIT_DEAD child D so delay_pidns_leader(I) == T.
> >
> > Now suppose that I->real_parent exits, lets denote this task as P.
> >
> > Suppose that P->real_parent ignores SIGCHLD.
> >
> > In this case P will do release_task(I) prematurely. And worse, when
> > D finally does realease_task(D) it will do realease_task(I) again.
>
> Good point. I will fix that and post a patch shortly. It doesn't
> need a full delay_pidns_leader test just a test for children.

This will add more complications. And even this is not enough, I guess.
For example __ptrace_detach()...

I agree, the idea to "hack" release_task() so that it switches to
init is clever, but imho this is too clever ;)

Seriously, what do you think about the patch below? Or something
like this. It is still based on your suggestion to check ->children,
but it is much, much more simple and understandable.

Just in case... Even with the PF_EXITING check __wake_up_parent()
can be wrong, but this is very unlikely and harmless.

What do you think?

> In looking for any other weird corner case bugs I am noticing that
> I don't think I handled the case of a ptraced init quite right.
> I don't understand the change signaling semantics when the
> ptracer is our parent.

Do you mean the "if (tsk->ptrace)" code in exit_notify() ? Nobody
understand it ;) Last time this code was modified by me (iirc), but
I simply tried to preserve the previous behaviour.

Oleg.

--- x/kernel/exit.c
+++ x/kernel/exit.c
@@ -63,6 +63,13 @@ static void exit_mm(struct task_struct *

static void __unhash_process(struct task_struct *p, bool group_dead)
{
+ struct task_struct *parent = p->parent;
+ bool parent_is_init = false;
+
+#ifdef CONFIG_PID_NS
+ parent_is_init = (task_active_pid_ns(p)->child_reaper == parent);
+#endif
+
nr_threads--;
detach_pid(p, PIDTYPE_PID);
if (group_dead) {
@@ -72,6 +79,11 @@ static void __unhash_process(struct task
list_del_rcu(&p->tasks);
list_del_init(&p->sibling);
__this_cpu_dec(process_counts);
+
+ if (parent_is_init && (parent->flags & PF_EXITING)) {
+ if (list_empty(&parent->children))
+ __wake_up_parent(p, parent);
+ }
}
list_del_rcu(&p->thread_group);
}
--- x/kernel/pid_namespace.c
+++ x/kernel/pid_namespace.c
@@ -184,6 +184,9 @@ void zap_pid_ns_processes(struct pid_nam
rc = sys_wait4(-1, NULL, __WALL, NULL);
} while (rc != -ECHILD);

+ wait_event(&current->signal->wait_chldexit,
+ list_empty(&current->children));
+
if (pid_ns->reboot)
current->signal->group_exit_code = pid_ns->reboot;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/