Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1
From: Eric W. Biederman
Date: Thu Apr 09 2020 - 13:06:51 EST
Adding Oleg to the conversation if for no other reason that he can see
it is happening.
Oleg has had a test case where this can happen for years and nothing
has come out as an obvious proper fix for this deadlock issue.
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> On Thu, Apr 9, 2020 at 9:15 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> may_ptrace_stop() is supposed to stop the blocking exactly so that it
>> doesn't deadlock.
>>
>> I wonder why that doesn't work..
>>
>> [ Goes and look ]
>>
>> Oh. I see.
>>
>> That ptrace_may_stop() only ever considered core-dumping, not execve().
>>
>> But if _that_ is the reason for the deadlock, then it's trivially fixed.
>
> So maybe may_ptrace_stop() should just do something like this
> (ENTIRELY UNTESTED):
>
> struct task_struct *me = current, *parent = me->parent;
>
> if (!likely(me->ptrace))
> return false;
>
> /* If the parent is exiting or core-dumping, it's not
> listening to our signals */
> if (parent->signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP))
> return false;
>
> /* if the parent is going through a execve(), it's not listening */
> if (parent->signal->group_exit_task)
> return false;
>
> return true;
>
> instead of the fairly ad-hoc tests for core-dumping.
>
> The above is hand-wavy - I didn't think a lot about locking.
> may_ptrace_stop() is already called under the tasklist_lock, so the
> parent won't change, but maybe it should take the signal lock?
>
> So the above very much is *not* meant to be a "do it like this", more
> of a "this direction, maybe"?
>
> The existing code is definitely broken. It special-cases core-dumping
> probably simply because that's the only case people had realized, and
> not thought of the execve() thing.
I don't see how there can be a complete solution in may_ptrace_stop.
a) We must stop in PTRACE_EVENT_EXIT during exec or userspace *breaks*.
Those are the defined semantics and I believe it is something
as common as strace that depends on them.
b) Even if we added a test for our ptrace parent blocking in a ptrace
attach of an ongoing exec, it still wouldn't help.
That ptrace attach could legitimately come after the thread in
question has stopped and notified it's parent it is stopped.
None of this is to say I like the semantics of PTRACE_EVENT_EXIT. It is
just we will violate the no regressions rule if we don't stop there
during exec.
The normal case is that the strace or whomever is already attached to
all of the threads during exec and no deadlock occurs. So the current
behavior is quite usable.
Maybe my memory is wrong that userspace cares but I really don't think
so.
Eric