Re: Question regarding ptrace work for LInux v3.1

From: Patrick Donnelly
Date: Wed Mar 23 2016 - 10:12:39 EST


On Mon, Mar 21, 2016 at 3:35 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 03/21, Patrick Donnelly wrote:
>> On Mon, Mar 21, 2016 at 3:07 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>> > case SIGSTOP:
>> > /* Black magic to get threads working on old Linux kernels... */
>> >
>> > if(p->nsyscalls == 0) { /* stop before we begin running the process */
>> > debug(D_DEBUG, "suppressing bootstrap SIGSTOP for %d",pid);
>> > signum = 0; /* suppress delivery */
>> > kill(p->pid,SIGCONT);
>> > }
>> > break;
>> >
>> > doesn't look right. Note that kill(pid,SIGCONT) affects the whole thread-
>> > group. So if this kill() races with another thread doing clone() you can
>> > hit the problem you described.
>>
>> You're right, that should be tkill! I will give that a try and report
>> back if that solved the issue for our collaborators...
>
> Ah, sorry, I should have mentioned this...
>
> No, tkill() won't help. See prepare_signal(), SIGCONT always removes
> the SIG_KERNEL_STOP_MASK signals from all threads, not matter if it was
> sent by tkill() or kill().
>
> Perhaps you should just remove this kill(SIGCONT) ?
>
> tracer_continue(signr => 0) should equally suppress the delivery. To
> clarify this won't be right too, but without PTRACE_SEIZE you simply
> can't write the code which handles the stop/cont/etc events correctly
> anyway...

Thanks so much Oleg. Indeed this was the problem.

--
Patrick Donnelly