Re: [PATCH 2/3] pidns: Guarantee that the pidns init will be the last pidns process reaped.

From: Eric W. Biederman
Date: Thu May 17 2012 - 17:47:13 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 05/16, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>>
>> > Hmm. I don't think the patch is 100% correct. Afaics, this needs more
>> > delay_pidns_leader() checks.
>> >
>> > For example. Suppose we have a CLONE_NEWPID zombie I, it has an
>> > EXIT_DEAD child D so delay_pidns_leader(I) == T.
>> >
>> > Now suppose that I->real_parent exits, lets denote this task as P.
>> >
>> > Suppose that P->real_parent ignores SIGCHLD.
>> >
>> > In this case P will do release_task(I) prematurely. And worse, when
>> > D finally does realease_task(D) it will do realease_task(I) again.
>>
>> Good point. I will fix that and post a patch shortly. It doesn't
>> need a full delay_pidns_leader test just a test for children.
>
> This will add more complications. And even this is not enough, I guess.
> For example __ptrace_detach()...

Agreed. I am having to step back and think about this a bit more.

I don't like doing things two different ways but delay_thread_group
leader and all of that is pretty horrible from a maintenance point
of view and extending that just makes things worse.

> I agree, the idea to "hack" release_task() so that it switches to
> init is clever, but imho this is too clever ;)
>
> Seriously, what do you think about the patch below? Or something
> like this. It is still based on your suggestion to check ->children,
> but it is much, much more simple and understandable.
>
> Just in case... Even with the PF_EXITING check __wake_up_parent()
> can be wrong, but this is very unlikely and harmless.
>
> What do you think?

I think there is something very compelling about your solution,
we do need my bit about making the init process ignore SIGCHLD
so all of init's children self reap.

Before I go farther I am going to play with the code more.

In part I think the current code for waiting for processes to
die etc is pretty horrible maintenance wise and it might just
be worth cleaning up before we extending it with yet another
strange and bizarre case, if for no other reason than to make
it clear what we are doing.


>> In looking for any other weird corner case bugs I am noticing that
>> I don't think I handled the case of a ptraced init quite right.
>> I don't understand the change signaling semantics when the
>> ptracer is our parent.
>
> Do you mean the "if (tsk->ptrace)" code in exit_notify() ? Nobody
> understand it ;) Last time this code was modified by me (iirc), but
> I simply tried to preserve the previous behaviour.

Yes. It is some pretty strange code. Especially where we are reading
a return result which is always false. I think there is a bug somewhere
between that code and ptrace detach but I don't know that I could tell
you what it is.

Hopefully I have a follow-on patch in another couple of hours.

Eric


> Oleg.
>
> --- x/kernel/exit.c
> +++ x/kernel/exit.c
> @@ -63,6 +63,13 @@ static void exit_mm(struct task_struct *
>
> static void __unhash_process(struct task_struct *p, bool group_dead)
> {
> + struct task_struct *parent = p->parent;
> + bool parent_is_init = false;
> +
> +#ifdef CONFIG_PID_NS
> + parent_is_init = (task_active_pid_ns(p)->child_reaper == parent);
> +#endif
> +
> nr_threads--;
> detach_pid(p, PIDTYPE_PID);
> if (group_dead) {
> @@ -72,6 +79,11 @@ static void __unhash_process(struct task
> list_del_rcu(&p->tasks);
> list_del_init(&p->sibling);
> __this_cpu_dec(process_counts);
> +
> + if (parent_is_init && (parent->flags & PF_EXITING)) {
> + if (list_empty(&parent->children))
> + __wake_up_parent(p, parent);
> + }
> }
> list_del_rcu(&p->thread_group);
> }
> --- x/kernel/pid_namespace.c
> +++ x/kernel/pid_namespace.c
> @@ -184,6 +184,9 @@ void zap_pid_ns_processes(struct pid_nam
> rc = sys_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
>
> + wait_event(&current->signal->wait_chldexit,
> + list_empty(&current->children));
> +
> if (pid_ns->reboot)
> current->signal->group_exit_code = pid_ns->reboot;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/