Re: Question regarding ptrace work for LInux v3.1

From: Oleg Nesterov
Date: Mon Mar 21 2016 - 15:28:16 EST


On 03/21, Patrick Donnelly wrote:
>
> That seems to be the case but it will only report certain events (not
> syscalls). I have observed PTRACE_EVENT_EXIT and PTRACE_EVENT_CLONE
> events... Hmm, now that I think about this, it would be necessary to
> see the initial SIGSTOP (or PTRACE_EVENT_STOP) in order to initiate
> syscall tracing via PTRACE_SYSCALL. So that does seem to indicate the
> problem.

Yes, exactly, you need to see the initial SIGSTOP or another event which
can be reported before it.

> > To clarify, the usage of SIGSTOP in ptrace was always buggy by design.
> > For example, SIGCONT from somewhere can remove the pending (and not yet
> > reported) SIGSTOP, and this _can_ explain the problem you hit.
>
> The tree of processes being traced do no send any signals but an
> external process may have.

I am looking into

https://github.com/cooperative-computing-lab/cctools/blob/5ccb04599ba2ee125730981f53add80d98cf8161/parrot/src/pfs_main.cc

and this code

case SIGSTOP:
/* Black magic to get threads working on old Linux kernels... */

if(p->nsyscalls == 0) { /* stop before we begin running the process */
debug(D_DEBUG, "suppressing bootstrap SIGSTOP for %d",pid);
signum = 0; /* suppress delivery */
kill(p->pid,SIGCONT);
}
break;

doesn't look right. Note that kill(pid,SIGCONT) affects the whole thread-
group. So if this kill() races with another thread doing clone() you can
hit the problem you described.

> However, I did notice the use of futexes
> near these clones. Perhaps that may be causing this?

I don't think so,

> > But unless you use PTRACE_SEIZE the same can happen on v3.1 so it seems
> > there is something else.
>
> Okay, it might be that PTRACE_SEIZE fixes it.

Yes, but iiuc you do not see this problem on v3.1 even with PTRACE_ATTACH?

Oleg.