Re: ptrace fixes for 3.2

From: Tejun Heo
Date: Tue Jan 03 2012 - 14:18:18 EST


Hello,

On Tue, Jan 03, 2012 at 06:09:27PM +0100, Oleg Nesterov wrote:
> > > The offending commit is 823b018e which moved the EXIT_DEAD check,
> > > but in fact we should not blame it. The original code was not
> > > correct as well because it didn't take ptrace_reparented() into
> > > account and because we can't really trust ->ptrace.
> > >
> > > This patch adds the additional check to close this particular
> > > race but it doesn't solve the whole problem. We simply can't
> > > rely on ->ptrace in this case, it can be cleared if the tracer
> > > is multithreaded by the exiting ->parent.
> >
> > I'm not following this part. Can you please explain it in a bit more
> > detail?
>
> Before 823b018e the code was:
>
> if (!ptrace && p->ptrace) {
> wo->notask_error = 0;
> return 0;
> }
>
> if (p->exit_state == EXIT_DEAD)
> return 0;
>
> There are 2 problems:
>
> 1. it is not correct to clear ->notask_error unless
> this child is ptrace_reparented(). Nobody will
> wakeup us if EXIT_DEAD was set by our sub-thread.

Yeah, if that happens between normal and ptrace wait_consider_task()
calls.

> 2. We can not rely on ->ptrace to detect this case.
>
> Suppose that the tracer is multithreaded, it has
> two threads T1 and T2, T1 traces our child.
>
> - T2 does do_wait(WEXITED), sets EXIT_DEAD, drops
> tasklist_lock.
>
> - T1 exits and does __ptrace_detach(), this means
> __ptrace_unlink() and nothing more.
>
> - Now, real_parent does do_wait() and sees the
> EXIT_DEAD child but ->ptrace = 0.
>
> - finally T2 sets EXIT_DEAD but it is too late,

You mean EXIT_ZOMBIE here, right? So, to rephrase, ZOMBIE -> DEAD ->
ZOMBIE dancing may race against tracer detaching. Urgh... why do we
even allow tasks which aren't the direct tracer to wait for ptraced
children anyway when ptrace ops are restricted to the ptracing task?

> > > Also, I think wait_consider_task() needs more fixes. I do not
> > > think we should clear ->notask_error without WEXITED in this
> > > case, but this is what we do in the EXIT_ZOMBIE case.
> >
> > Hmmm... I'm not sure about that. Why do you think so?
>
> I am not sure too. But why do_wait() should sleep if it is called
> without WEXITED (lets ignore WCONTINUED) and the child is ZOMBIE?
> I think it should return -ECHILD, like it does if the child is not
> traced.
>
> IOW. Suppose we have a single EXIT_ZOMBIE child. If it is not traced,
> do_wait(WSTOPPED) returns -ECHILD. If the child is traced (by another
> process) do_wait() sleeps until detach just to return the same error.
> This looks a bit strange.

I don't think it really matter either way. To me, both behaviors seem
understandable and I don't think the current behavior would cause any
real problem.

Thanks a lot for the explanation.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/