Re: Mutual debugging of 2 processes can stuck in unkillable stopped state

From: Igor Zhbanov
Date: Mon Mar 29 2021 - 13:11:13 EST


Hi Oleg!

I've tried both 5.3.18 and 5.10.0. The behavior is the same.
The important thing is to run "exec strace -p ..." on the second terminal
to create the loop A->B->A.

So the last line from the first strace we see is:
ptrace(PTRACE_SEIZE, 1990, NULL, PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT

I.e. it printed the syscall prior to its execution and hanged after the
execution.

izh@suse2:~> ps awux|grep strace
izh 1891 0.0 0.0 24752 3828 pts/1 ts+ 19:52 0:00 strace -p 1990
izh 1990 0.0 0.0 24752 3628 pts/0 t+ 19:53 0:00 strace -p 1891

izh@suse2:~> kill 1990 1891
izh@suse2:~> kill -9 1990 1891

izh@suse2:~> sudo cat /proc/1891/stack
[sudo] password for root:
[<0>] ptrace_stop+0x14a/0x260
[<0>] ptrace_do_notify+0x91/0xb0
[<0>] ptrace_notify+0x4e/0x70
[<0>] do_exit+0x910/0xb70
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x124/0x800
[<0>] arch_do_signal_or_restart+0xa9/0x290
[<0>] exit_to_user_mode_prepare+0xe7/0x1a0
[<0>] syscall_exit_to_user_mode+0x18/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

izh@suse2:~> sudo cat /proc/1990/stack
[<0>] ptrace_stop+0x14a/0x260
[<0>] ptrace_do_notify+0x91/0xb0
[<0>] ptrace_notify+0x4e/0x70
[<0>] do_exit+0x910/0xb70
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x124/0x800
[<0>] arch_do_signal_or_restart+0xa9/0x290
[<0>] exit_to_user_mode_prepare+0xe7/0x1a0
[<0>] syscall_exit_to_user_mode+0x18/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

izh@suse2:~> cat /proc/1891/status
Name: strace
Umask: 0022
State: t (tracing stop)
Tgid: 1891
Ngid: 0
Pid: 1891
PPid: 1890
TracerPid: 1990
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 100
NStgid: 1891
NSpid: 1891
NSpgid: 1891
NSsid: 1891
VmPeak: 24752 kB
VmSize: 24752 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 3828 kB
VmRSS: 3828 kB
RssAnon: 520 kB
RssFile: 3308 kB
RssShmem: 0 kB
VmData: 284 kB
VmStk: 132 kB
VmExe: 1108 kB
VmLib: 2828 kB
VmPTE: 80 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 4/15639
SigPnd: 0000000000000000
ShdPnd: 0000000000014100
SigBlk: 0000000000002000
SigIgn: 0000000000300000
SigCgt: 0000000180007007
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: vulnerable
SpeculationIndirectBranch: always enabled
Cpus_allowed: 7
Cpus_allowed_list: 0-2
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1561
nonvoluntary_ctxt_switches: 7

izh@suse2:~> cat /proc/1990/status
Name: strace
Umask: 0022
State: t (tracing stop)
Tgid: 1990
Ngid: 0
Pid: 1990
PPid: 1847
TracerPid: 1891
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 100
NStgid: 1990
NSpid: 1990
NSpgid: 1990
NSsid: 1847
VmPeak: 24752 kB
VmSize: 24752 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 3628 kB
VmRSS: 3628 kB
RssAnon: 520 kB
RssFile: 3108 kB
RssShmem: 0 kB
VmData: 284 kB
VmStk: 132 kB
VmExe: 1108 kB
VmLib: 2828 kB
VmPTE: 88 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 4/15639
SigPnd: 0000000000000000
ShdPnd: 0000000000014100
SigBlk: 0000000000002000
SigIgn: 0000000000300000
SigCgt: 0000000180007007
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: vulnerable
SpeculationIndirectBranch: always enabled
Cpus_allowed: 7
Cpus_allowed_list: 0-2
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 180
nonvoluntary_ctxt_switches: 848

On 29.03.2021 19:49, Oleg Nesterov wrote:
On 03/29, Igor Zhbanov wrote:

Mutual debugging of 2 processes can stuck in unkillable stopped state

can't reproduce and can't understand...

Hi!

When one process, let's say "A", is tracing the another process "B", and the
process "B" is trying to attach to the process "A", then both of them are
getting stuck in the "t+" state. And they are ignoring all of the signals
including the SIGKILL,

Why do you think so? What is your kernel version?

"t" means TASK_TRACED, SIGKILL should wake it up and terminate.

so it is not possible to terminate them without
a reboot.

To reproduce:
1) Run two terminals
2) Attach with "strace -p ..." from the first terminal to the shell (bash) of
the second terminal.
3) In the second terminal run "exec strace -p ..." to attach to the PID of the
first strace.

Then you'll see that the second strace is hanging without any output. And the
first strace will output following and hang too:
ptrace(PTRACE_SEIZE, 11795, NULL,
PTRACE_O_TRACESYSGOOD|PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT

(The 11795 is the PID of the first strace itself.)

And in the process list you will see following:
ps awux | grep strace
user 11776 0.0 0.0 24752 2248 pts/3 t+ 13:53 0:00 strace -p 11795
user 11795 0.0 0.0 24752 3888 pts/1 t+ 13:54 0:00 strace -p 11776

OK, may be they sleep in PTRACE_EVENT_EXIT? After you tried to send SIGKILL?

please show us the output from "cat /proc/{11795,11776}/stack". And
"cat /proc/{11795,11776}/status" just in case.