Re: [PATCH 1/4] forget_original_parent: split out the un-ptracepart

From: Oleg Nesterov
Date: Wed Feb 25 2009 - 15:47:55 EST


On 02/24, Roland McGrath wrote:
>
> > > --- a/kernel/ptrace.c
> > > +++ b/kernel/ptrace.c
> > > @@ -534,7 +534,7 @@ repeat:
> > > * Set the ptrace bit in the process ptrace flags.
> > > * Then link us on our parent's ptraced list.
> > > */
> > > - if (!ret) {
> > > + if (!ret && !(current->real_parent->flags & PF_EXITING)) {
> > > current->ptrace |= PT_PTRACED;
> >
> > Yes sure.
> >
> > But this means exit_ptrace() must always take tasklist, otherwise we
> > don't have the necessary barriers.
>
> Really?
>
> exit_signals(tsk); /* sets PF_EXITING */
> /*
> * tsk->flags are checked in the futex code to protect against
> * an exiting task cleaning up the robust pi futexes.
> */
> smp_mb();
>
> This is an exactly analogous use, isn't it? So exit_ptrace() just has to
> follow this same existing barrier. Right?

Yes, we do have the barrier between "flags |= PF_EXITING" and
"if (list_empty(ptraced))" in exit_ptrace(), but it is not enough.

Because the exiting ->real_parent can both set PF_EXITING and return
from exit_ptrace() (without taking tasklist because it sees ->ptraced
is empty) right after the child checks ->real_parent->flags & PF_EXITING.

I am still thinking what can we do here (and btw my apologies for delay,
some stupid reasons distract me).

> > But from the _pure theoretical_ pov, it is not correct to assume that
> > list_empty(&tracer->ptraced) == T means that current can not be used
> > somehow as tracee->parent. Another subthread can release a dead tracee.
>
> I don't follow how that's relevant. If list_empty(), then it was empty or
> is becoming empty. It can't then become nonempty again (because the thread
> doing the check is the only one that adds to that list). That's all we're
> assuming.
>
> > For example, list_empty(&tracer->ptraced) == T doesn't mean that the
> > STOREs to this task_struct are finished, list_del_init(->ptrace_entry)
> > can still be in progress.
>
> Sure, but so what? The check is to verify that some new list_del* (and
> related cleanup work, of course) doesn't need to be *started*.

Well. I am starting to regret I mentioned this "problem" ;) Because even
if I am right (it is very possible I am not), this all is _absolutely_
theoretical. Let me try again to explain what I meant.

First of all, in theory write_lock_irq() does not imply rcu_read_lock().

Now let's suppose that the exiting task T does exit_ptrace(), sees the
empty ->ptraced list, and then can do release_task()->call_rcu(put_task_struct)
without taking the tasklist_lock on this path.

Let's also suppose that we race with another sub-thread which reaps
a zombie tracee and does __ptrace_unlink()->list_del_init(ptrace_entry).
__list_del does 1) next->prev = prev and 2) prev->next = next.

Let's suppose 2 is completed, but 1 is not.

T checks list_empty(->ptraced) and sees head->next == head. It proceeds
and calls call_rcu(put_task_struct).

Since (in theory!) we do not have rcu_read_lock(), it is possible that
task_struct is already freed when 1 write to the memory.


But actually I meant that this is not really safe "in general". Let's
suppose we change __ptrace_unlink() so that it does, say,
BUG_ON(child->parent->exit_state != 0) before untracing. Yes, sure,
this is ugly, but correct. Or BUG_ON(!child->parent->signal), or
whatever else.

But this is only correct because T takes tasklist before it actually
starts to "destroy" itself.


In short, my point is: even if exit_ptrace() sees list_empty(->ptraced),
it is possible that the just-untraced tracee "looks" at us and expects
that the former tracer is "alive" enough.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/