Re: [path][rfc] add PR_DETACH prctl command

From: Oleg Nesterov
Date: Thu Mar 31 2011 - 13:03:10 EST


Hi Stas,

On 03/31, Stas Sergeev wrote:
>
> I found some time to get back to that patch and
> to address all of the problems you pointed.
> What do you think about the attached patch?
> I didn't expect it would became that big.

fs/proc/array.c | 7 -
include/asm-generic/siginfo.h | 3
include/linux/init_task.h | 2
include/linux/prctl.h | 2
include/linux/sched.h | 21 +++-
kernel/exit.c | 200 +++++++++++++++++++++++++++++++++++-------
kernel/fork.c | 4
kernel/signal.c | 59 +++++++-----
kernel/sys.c | 45 +++++++++
9 files changed, 281 insertions(+), 62 deletions(-)

Eek! Not only it is big. It is complex and changes a lot of core
kernel code.

Sorry Stas, I am not going to try to review it carefully. As I said,
you need to convince lkml we need this feature first. And iirc you
are not going to suggest this change for everyone.

I guess, the main complication is that you are trying to ensure the
old parent can do wait() without -ECHLD... This complicates everything
soooooooooo much. I _feel_ this can be simplified.... but in any case
we need the nasty complications. And for what?


I only looked at sys_prctl() code, and almost every line looks wrong.
Hmm... in fact, the changes in exit.c look wrong too, but I didn't really
try to understand them.

> @@ -1736,6 +1737,50 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> else
> error = PR_MCE_KILL_DEFAULT;
> break;
> + case PR_DETACH: {
> + struct task_struct *p, *old_parent;
> + int notif = DEATH_REAP;
> + error = -EPERM;
> + /* not detaching from init */
> + if (me->real_parent == init_pid_ns.child_reaper)

2 problems. You shouldn't use init_pid_ns, you need the task's namespace.
Also, the task can be the child of /sbin/init's sub-thread.

> + write_lock_irq(&tasklist_lock);
> + old_parent = me->real_parent;
> + me->detach_code = arg2 << 8;
> + if (!task_detached(me))
> + notif = do_signal_parent(me, me->exit_signal,
> + CLD_DETACHED, arg2);

This is simply wrong. We reparent the whole thread group, we should
always notify the old parent. Or never. but this shouldn't depend on
the thread.

> + if (notif != DEATH_REAP) {
> + list_add_tail(&me->detached_sibling,
> + &me->real_parent->detached_children);
> + me->exit_state = EXIT_DETACHED;

No, no, we can't set ->exit_state != 0. This means the task is dead.

> + if (!ptrace_reparented(me))
> + me->parent = init_pid_ns.child_reaper;

Again, this shouldn't use init_pid_ns.child_reaper. But the main problem,
you can't trust ptrace_reparented(). What if the old parent ptraces this
task?

> + /* detaching makes us a group leader */
> + me->group_leader = me;

How? Now, we can't change ->group_leader, this is simply not possible
and very wrong. If nothing else, think about tid/tgid, but there are
a lot more problems.

> + while_each_thread(me, p) {
> + if (p->real_parent != old_parent)
> + continue;
> + if (!ptrace_reparented(p))
> + p->parent = init_pid_ns.child_reaper;
> + p->real_parent = init_pid_ns.child_reaper;

The same problems as above, pluse "p->real_parent != old_parent" looks
bogus.


Well. Once again, I never argue with new features, but you need to
convince lkml. Probably it is simple to implement PR_DETACH so that
the task just "disappears" from the old_parent's radar. Otherwise
we need more complications, but I'd rather add the fake TASK_ZOMBIE
task_struct for that. This will be much, much simply although not
pretty anyway.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/