RE: [RFC PATCH] set TASK_TRACED before arch_ptrace code to fix a race

From: Roland McGrath
Date: Tue Jun 03 2008 - 18:14:22 EST


> This might not be the same bug ... but I do have a definite 100%
> reproducible bug (latest git kernel, old version of strace (4.5.15-1.el4.1))

Please start a thread with a sensical subject line about that.

> Backtrace of pid 6443 (make)
>
> Call Trace:
> [<a0000001007069b0>] schedule+0x11f0/0x1380
> sp=e0000001b768fb40 bsp=e0000001b7680d58
> [<a000000100707800>] schedule_timeout+0x40/0x180
> sp=e0000001b768fb60 bsp=e0000001b7680d28
> [<a000000100706d60>] wait_for_common+0x220/0x380
> sp=e0000001b768fb90 bsp=e0000001b7680cd8
> [<a000000100706f00>] wait_for_completion+0x40/0x60
> sp=e0000001b768fbf0 bsp=e0000001b7680cb8
> [<a0000001000794d0>] do_fork+0x430/0x4a0
> sp=e0000001b768fbf0 bsp=e0000001b7680c60
> [<a00000010000a340>] sys_clone+0x60/0x80
> sp=e0000001b768fc20 bsp=e0000001b7680c10
> [<a00000010000a990>] ia64_trace_syscall+0xd0/0x110
> sp=e0000001b768fe30 bsp=e0000001b7680c10
> [<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000001b7690000 bsp=e0000001b7680c10

This trace (do_fork->wait_for_completion) tells us this is a vfork call.
It is waiting for its child (presumably 6444) to exit or exec.

> Backtrace of pid 6444 (make)
>
> Call Trace:
> [<a0000001007069b0>] schedule+0x11f0/0x1380
> sp=e0000001b803fd60 bsp=e0000001b8030dd8
> [<a000000100097590>] ptrace_stop+0x2d0/0x380
> sp=e0000001b803fd80 bsp=e0000001b8030da0
> [<a000000100097c90>] get_signal_to_deliver+0x1d0/0x6a0
> sp=e0000001b803fd80 bsp=e0000001b8030d38
> [<a000000100034a10>] ia64_do_signal+0xb0/0xd00
> sp=e0000001b803fd80 bsp=e0000001b8030c90
> [<a000000100012c60>] do_notify_resume_user+0x100/0x180
> sp=e0000001b803fe20 bsp=e0000001b8030c60
> [<a00000010000b0c0>] notify_resume_user+0x40/0x60
> sp=e0000001b803fe20 bsp=e0000001b8030c10
> [<a00000010000aff0>] skip_rbs_switch+0xe0/0x110
> sp=e0000001b803fe30 bsp=e0000001b8030c10
> [<a000000000010740>] __kernel_syscall_via_break+0x0/0x20
> sp=e0000001b8040000 bsp=e0000001b8030c10

This is the normal trace for the child having received a signal and stopped
to tell ptrace about it (not a syscall tracing stop).

I think you need to look into what strace is doing. There is far too much
going to know much of anything just from the kernel state where the
processes sit. In particular, the sequence of ptrace and wait calls strace
made. If the same strace (identical everything) behaved differently with
an older kernel, then compare the sequence of ptrace and wait calls and see
where it differs.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/