Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang afterPTRACE_ATTACH

From: Jan Kratochvil
Date: Fri Feb 18 2011 - 16:35:05 EST


On Thu, 17 Feb 2011 17:49:06 +0100, Oleg Nesterov wrote:
> > - that is to leave the process in
> > `T (stopped)' without any single PC step.
>
> This is not exactly clear to me... I mean "without any single PC step".
> Why?

Engineers investigating problems of applications SIGSTOP it when it is in the
critical situation. Then they run gcore, gstack etc. After they are
satisfied with the analsysis they send SIGCONT.

If the application being investigated changes state between the various tools
it may be confusing as the dumps will not match. Ale in some cases some
critical state being investigated may get lost.


> > A new proposal is to preserve the process's `T (stopped)' for
> > a naive/legacy debugger / ptrace tool doing PTRACE_ATTACH, wait->SIGSTOP,
> > PTRACE_DETACH(0), incl. GDB doing the "GDB trick" above.
> > That is after PTRACE_DETACH(0) the process should remain `T (stopped)'
> > iff the process was `T (stopped)' before PTRACE_ATTACH.
> > - PTRACE_DETACH(0) should preserve `T (stopped)'.
>
> Hmm. OK, but I assume you meant "unless the tracee was resumed in between".

You described the exact behavior of current Fedora/RHEL gdb. But in general
I do not insist on it, one can for example run an inferior function call
during the investigation-under-SIGSTOP described above, even in such case one
still wants to detach the application still in the `T (stopped)' mode.

Detaching process as '(T) stopped' is not such a problem as the app/user can
send SIGCONT to it. But accidentally unstopping the process during detach
cannot be fixed/workarounded.


> But. Let me remind. PTRACE_DETACH(SIGXXX) does not always work as
> gdb thinks, SIGXXX can be ignored.

In such case it is a bug. Due to this bug there is probably the
tgkill(SIGSTOP)+PTRACE_DETACH(0) used by the "detach-stopped-rhel5"
ptrace-testsuite testfile, IIUC.


> > Personally I would keep it completely hidden from the debugger and only
> > remember the last SIGCONT vs. SIGSTOP for the case the session ends with
> > PTRACE_DETACH(0). Debugger/strace would not be able to display any externally
> > received SIGSTOP/SIGCONT. PTRACE_CONT(SIGSTOP) and PTRACE_CONT(SIGCONT)
> > should behave as PTRACE_CONT(0) to clean up compatibility with existing tools.
>
> Can't understand... could you explain?

A process is not in the `T (stopped)' state randomly. AFAIK it is there due
to an engineer sending it SIGSTOP. Applications themselves do not use SIGSTOP
themselves to get into `T (stopped)' during their execution.

And if the engineer sent SIGSTOP it was intentional. The engineer does not
want some tool to accidentally cancel his intentional SIGSTOP. When the
engineer decides so (s)he can send SIGCONT appropriately.

SIGSTOP I find as a hard stop and thus even the tracers/debuggers of
the `T (stopped)' process should just get no response from it. I do not think
ptrace is a good tool for some general system monitoring - to see any
SIGCONT/SIGSTOP deliveries - because ptrace is (a) single-master limited
(second PTRACE_ATTACH gets EPERM) and (b) ptrace-control is not transparent
due to the threads/races timing (on `t (tracing stop)'). For global system
tracing incl. the SIGCONT/SIGSTOP deliveries there are more suitable the fully
transparent tools like systemtap.

Therefore if the debugger sends some SIGSTOP/SIGCONT those should be rather
ignored for compatibility reasons as they may be either just bogus or used as
workarounds (such as in the FSF GDB PTRACE_ATTACH-SIGSTOP-trick) of ptrace
bugs which should no longer be needed.



Thanks,
Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/