Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang afterPTRACE_ATTACH

From: Tejun Heo
Date: Mon Feb 21 2011 - 11:22:17 EST


Hey, :-)

On Fri, Feb 18, 2011 at 08:37:09PM +0100, Oleg Nesterov wrote:
> > Still trying to follow the new discussion.
>
> And how it goes?
>
> As for me, I am not sure I can follow it ;)

The issues Denys brought up are okay but I still haven't gotten my
head wrapped around what Jan and you are talking about. Urgh... :-)

> > Instead of having simple "a ptracer stops in TASK_TRACED and its
> > execution is under the control of ptrace",
>
> In fact, I am not sure I really disagree with this part, but see below.
>
> > The patch which puts the tracee into TASK_TRACED
> > on ATTACH already fix two problems discussed in this thread without
> > doing anything wonky. I think it says a lot.
>
> Yes. One off-topice note... if we are talking about this patch only,
> I do not think it makes sense to add the new member into task_struct
> so that STOPPED/TRACED transition can always report the exactly correct
> ->exit_code. I think we can just use group_exit_code ?: SIGSTOP.
> But, again, this is off-topic.

It shares the task->group_stop which is needed for other things
anyway, but yeah, if we're sure it's either that or SIGSTOP that would
definitely be better. Hmmm, but it can be other things. There are
many signals which can trigger group stop. Maybe this is not
important but then again preserving this doesn't cost us much either.

BTW, I plan on separating out all ptrace related stuff into a separate
struct as it's not used by most tasks anyway, so I don't think we need
to be too concerned about several more fields.

> > As it currently stands, SIGSTOP/CONT while ptraced doesn't work
>
> And this is probably where we disagree the most. I think this is bug,
> and this should be fixed.

I don't think we disagree that it is a bug. I want to fix it too but
we definitely seem to disagree on how. I want to give more control to
the ptracer so that the tracer has enough information and control to
follow the group stop semantics if it wants to and you want to give
more control to group stop so that it overrides the tracer and always
does the right thing regarding group stop.

> > and even if we bend the rules subtly and provide sneaky ways like
> > the above, userland needs to be modified to make use of it anyway.
>
> Yes. But with the current code we can't modify, say, strace so
> that SIGSTOP/CONT can work "correctly".

Agreed, not possible. The kernel needs to be improved one way or the
other.

> > I think it would be far cleaner to simply make ptracee always stop
> > in TASK_TRACED and give the ptracer a way to notice what's
> > happening to the tracee
>
> Well. If we accept the proposed PTRACE_CONT-needs-SIGCONT behaviour,
> then I think this probably makes sense. The tracee stops under ptrace,
> the possible SIGCONT shouldn't abuse debugger which wants to know, say,
> the state of registers.

The objections I have against PTRACE_CONT-needs-SIGCONT are,

* It will be very different from the current behavior.

* ptrace, sans the odd SIGSTOP on attach which we should remove, is
per-task. Sending out SIGCONT on PTRACE_CONT would break that. I
really don't think that's a good idea.

* PTRACE_CONT would be behaving completely differently depending on
whether it's resuming from group stop or other traps.

> To be honest, I don't understand whether I changed my mind now, or
> I was never against this particular change in behaviour.
>
> Once debugger does PTRACE_CONT, the tracee becomes TASK_STOPPED and
> now it is "visible" to SIGCONT (or the tracee resumes if SIGCONT has
> come in between).
>
> But I think you will equally blame this TRACED/STOPPED transition
> as "behavioral subtleties" and I can understand you even if I disagree.
> And yes, this leads to other questions. But note that this greatly
> simplifies things. The tracee can never participate in the same
> group-stop twice.

But that's not really because the problem is solved. The problem is
put out of scope by forcing the tracer to always override group stop.
That's a rather big departure from the current behavior and capability
and I frankly think is not a good direction to head to. It's like
giving up useful features for conceptual purity. We can make it work
without regressing on capabilities.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/