Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.
From: Eric W. Biederman
Date: Sun Apr 02 2017 - 17:12:45 EST
Oleg Nesterov <oleg@xxxxxxxxxx> writes:
> Perhaps I am wrong, but I think you underestimate the problems, and it is
> not clear to me if we really want this.
I worked through quite a bit of it and I realized a few fundamental
issues. The task struct must remain visible until it is reaped
and we use siglock to protect in unhash process to protect that reaping.
Further tsk->sighand == NULL winds up being a flag used to tell
if release_task has been called.
To get an usable count on sighand struct all that needed to be done
was to change the reference counting of sighand_struct to count
processes and not threads.
Which is what I wound up posting.
> Anyway, Eric, even if we can and want to do this, why we can't do this on
> top of my fix?
Because your reduction in scope of cred_guard_mutex is fundamentally
broken and unnecessary.
> I simply fail to understand why you dislike it that much. Yes it is not
> pretty, I said this many times, but it is safe in that it doesn't really
> change the current behaviour.
No it is not safe. And it promotes wrong thinking which is even more
dangerous.
I reviewed the code and cred_guard_mutex needs to cover what it covers.
> I am much more worried about 2/2 you didn't argue with, this patch _can_
> break something and this is obviously not good even if PTRACE_EVENT_EXIT
> was always broken.
I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
know what the implications of changing it are. Let's see...
gdb - no
upstart - no
lldb - yes
strace - no
It looks like lldb is worth testing with your PTRACE_EVENT_EXIT change
to see if anything breaks.
I think we can get away with changing the exec case but it does look
worth testing. I hadn't realized you hadn't looked to see what was
using PTRACE_O_TRACEEXIT to see if any part of userspace cares.
Hmm. This is interesting. From the strace documentation:
> Tracer cannot assume that ptrace-stopped tracee exists. There are many
> scenarios when tracee may die while stopped (such as SIGKILL).
> Therefore, tracer must always be prepared to handle ESRCH error on any
> ptrace operation. Unfortunately, the same error is returned if tracee
> exists but is not ptrace-stopped (for commands which require stopped
> tracee), or if it is not traced by process which issued ptrace call.
> Tracer needs to keep track of stopped/running state, and interpret
> ESRCH as "tracee died unexpectedly" only if it knows that tracee has
> been observed to enter ptrace-stop. Note that there is no guarantee
> that waitpid(WNOHANG) will reliably report tracee's death status if
> ptrace operation returned ESRCH. waitpid(WNOHANG) may return 0 instead.
> IOW: tracee may be "not yet fully dead" but already refusing ptrace
> ops.
If delivering a second SIGKILL to a ptraced stopped processes will
make it continue we have a very interesting out..
When we stop in ptrace_stop we stop in TASK_TRACED == (TASK_WAKEKILL|__TASK_TRACED)
Delivery of a SIGKILL to that task has queue SIGKILL
and call signal_wake_up_state(t, TASK_WAKEKILL).
Which becomes wake_up_state(t, TASK_INTERRUPTIBLE | TASK_WAKEKILL)
Which wakes up the process.
So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
before the tracers find it.
Therefore we are only talking a quality of implementation issue
if we actually stop and wait for the tracer or not.
....
Which brings us to your PTRACE_EVENT_EXIT patch.
I think may_ptrace_stop is tested in the wrong place,
and is probably buggy.
- We should send the signal in all cases except when the ptracing parent
does not exist aka (!current->ptrace). The siginfo contains enough
information to understand what happened if anyone is listening.
- Then we should send the group stop.
- Then if we don't want to wait we should:
__set_current_state(TASK_RUNNING)
- Then we should drop the locks and only call freezable_schedule if
we want to wait.
That way userspace thinks someone else just sent a SIGKILL and killed
the thread before it had a chance to look (which is effectively what
we are doing). That sounds idea for both core-dumps and this case.
Eric