Re: [RFC PATCH] set TASK_TRACED before arch_ptrace code to fix arace

From: Petr Tesarik
Date: Tue Jun 03 2008 - 10:34:23 EST


Petr Tesarik wrote:
> Luming Yu wrote:
>> On Tue, May 27, 2008 at 2:25 PM, Petr Tesarik <ptesarik@xxxxxxx> wrote:
>>> On Mon, 2008-05-26 at 23:12 -0700, Roland McGrath wrote:
>>>>> [<a00000010000a9f0>] skip_rbs_switch+0xe0/0x110
>>>>> sp=e000000141c9fe30 bsp=e000000141c90cf8
>>>>> [<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
>>>>> sp=e000000141ca0000 bsp=e000000141c90cf8
>>> Indeed, there seems to be a large hole here. So, this is either a bug in
>>> the unwinder, or a bug in the RBS synchronization, which causes
>>> corruption. My test machine currently needs some work to run 2.6.25
>>> again, but I'll try your test case as soon as I re-install it later this
>>> week.
>> Just want to check if the test case works for you?
>
> Yes, the test case hangs here too. But the problem seems to be
> elsewhere. Did you look into the strace output? This line is pretty
> suspicious:
>
> 3258 clone2(child_stack=0, stack_size=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x200000000004e290) = 1
>
> Obviously, strace cannot attach PID 1, and since it is not designed to
> handle this situation, it hangs. I'm going to investigate why the return
> value of the clone2 syscall is seen as 1 by the tracer. Might even turn
> out to be a bug in strace...

It's definitely a bug in strace. For some reason (I don't care about)
the execve() syscall produces an extra notification. However, this
notification message is suppressed when SIGTRAP is blocked. This
explains why the test case fails only when SIGTRAP is blocked.

Now, you may ask why it only fails on ia64 and not on i386 or x86_64.
Well, I was so good that I even looked into strace sources to make sure.
Whereas for i386 and x86_64, the value of EAX/RAX is checked for -ENOSYS
in syscall_fixup(), for ia64 the first ptrace() after an execve() is
unconditionally ignored, see code in get_scno().

I don't know why Luming's fix helps here, but, please, fix strace, don't
introduce weird behaviour in the kernel.

The only thing I'm willing to talk about is why the extra notification
message is sent, and how userspace (strace) is supposed to recognize it.
FWIW the backtrace (system tap was at __group_send_sig_info):

0xa0000001000b1a60 : __group_send_sig_info+0x0/0x180 []
0xa0000001000b1e30 : do_notify_parent_cldstop+0x250/0x2c0 []
0xa0000001000b2230 : ptrace_stop+0x2b0/0x3c0 []
0xa0000001000b5200 : get_signal_to_deliver+0x200/0xa40 []
0xa000000100035920 : ia64_do_signal+0xa0/0xee0 []
0xa000000100014b60 : do_notify_resume_user+0x100/0x160 []
0xa00000010000d040 : notify_resume_user+0x40/0x60 []
0xa00000010000cf40 : skip_rbs_switch+0xf0/0x150 []
0xa000000000010620 : __kernel_syscall_via_break+0x0/0x20 []

Regards,
Petr Tesarik

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/