Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

From: Pedro Alves
Date: Thu Nov 19 2015 - 11:50:01 EST


On 11/17/2015 07:34 PM, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:

>>> And perhaps we can simply remove this logic? I forgot why do we hide this
>>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
>>> vague feeling tells me that we discussed this before and perhaps it was me
>>> who suggested to avoid the user-visible change when you introduced this
>>> transition...
>>
>> Heh, it was too long ago for me to remember much. :)
>
> Same here...
>
>>> Anyway, now I do not understand why do we want to hide it. Lets consider
>>> the following "test-case",
>>>
>>> void test(int pid)
>>> {
>>> kill(pid, SIGSTOP);
>>> waitpid(pid, NULL, WSTOPPED);
>>>
>>> ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>>>
>>> assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
>>> }
>>>
>>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
>>> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>>>
>>> Jan, Pedro, do you think the patch below can break gdb somehow? With this
>>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
>>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
>>> tracee was TASK_STOPPED before attach.

Not sure, because I don't think I fully understand that proposed change.

Both GDB and gdbserver have special processing for attaching to already-stopped
processes. (and neither use PTRACE_SEIZE yet.)

Here's the gdbserver version:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903

Copied here for convenience:

907 linux_attach_lwp (ptid_t ptid)
908 {
909 struct lwp_info *new_lwp;
910 int lwpid = ptid_get_lwp (ptid);
911
912 if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0)
913 != 0)
914 return errno;
915
916 new_lwp = add_lwp (ptid);
917
918 /* We need to wait for SIGSTOP before being able to make the next
919 ptrace call on this LWP. */
920 new_lwp->must_set_ptrace_flags = 1;
921
922 if (linux_proc_pid_is_stopped (lwpid))
923 {
924 if (debug_threads)
925 debug_printf ("Attached to a stopped process\n");
926
927 /* The process is definitely stopped. It is in a job control
928 stop, unless the kernel predates the TASK_STOPPED /
929 TASK_TRACED distinction, in which case it might be in a
930 ptrace stop. Make sure it is in a ptrace stop; from there we
931 can kill it, signal it, et cetera.
932
933 First make sure there is a pending SIGSTOP. Since we are
934 already attached, the process can not transition from stopped
935 to running without a PTRACE_CONT; so we know this signal will
936 go into the queue. The SIGSTOP generated by PTRACE_ATTACH is
937 probably already in the queue (unless this kernel is old
938 enough to use TASK_STOPPED for ptrace stops); but since
939 SIGSTOP is not an RT signal, it can only be queued once. */
940 kill_lwp (lwpid, SIGSTOP);
941
942 /* Finally, resume the stopped process. This will deliver the
943 SIGSTOP (or a higher priority signal, just like normal
944 PTRACE_ATTACH), which we'll catch later on. */
945 ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0);
946 }
947
948 /* The next time we wait for this LWP we'll see a SIGSTOP as PTRACE_ATTACH
949 brings it to a halt.
950

linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T (stopped)".

Here's the equivalent in gdb:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974

This queuing of a SIGSTOP + PTRACE_CONT was necessary because
otherwise when gdb attaches to a job stopped process, gdb would hang in the waitpid
after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

If the proposed change makes it so that a new intermediate state can be observed
right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
then there's potential for breakage. But maybe not, if we're sure that
that when that happens, waitpid returns for the initial
PTRACE_ATTACH-induced SIGSTOP.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/