Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

From: Bernd Edlinger
Date: Thu Apr 09 2020 - 13:17:56 EST



On 4/9/20 7:03 PM, Eric W. Biederman wrote:
>
> Adding Oleg to the conversation if for no other reason that he can see
> it is happening.
>
> Oleg has had a test case where this can happen for years and nothing
> has come out as an obvious proper fix for this deadlock issue.
>

Just for reference, I used Oleg's test case,
and improved it a bit. The test case anticipates the
EAGAIN return code from PTRACE_ATTACH. This is likely
to change somehow.
If Linus's idea works, you will probably have to
look at adjusting the test expectations again.

I would still be surprised if any other solution works.


Bernd.

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
>> On Thu, Apr 9, 2020 at 9:15 AM Linus Torvalds
>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> may_ptrace_stop() is supposed to stop the blocking exactly so that it
>>> doesn't deadlock.
>>>
>>> I wonder why that doesn't work..
>>>
>>> [ Goes and look ]
>>>
>>> Oh. I see.
>>>
>>> That ptrace_may_stop() only ever considered core-dumping, not execve().
>>>
>>> But if _that_ is the reason for the deadlock, then it's trivially fixed.
>>
>> So maybe may_ptrace_stop() should just do something like this
>> (ENTIRELY UNTESTED):
>>
>> struct task_struct *me = current, *parent = me->parent;
>>
>> if (!likely(me->ptrace))
>> return false;
>>
>> /* If the parent is exiting or core-dumping, it's not
>> listening to our signals */
>> if (parent->signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP))
>> return false;
>>
>> /* if the parent is going through a execve(), it's not listening */
>> if (parent->signal->group_exit_task)
>> return false;
>>
>> return true;
>>
>> instead of the fairly ad-hoc tests for core-dumping.
>>
>> The above is hand-wavy - I didn't think a lot about locking.
>> may_ptrace_stop() is already called under the tasklist_lock, so the
>> parent won't change, but maybe it should take the signal lock?
>>
>> So the above very much is *not* meant to be a "do it like this", more
>> of a "this direction, maybe"?
>>
>> The existing code is definitely broken. It special-cases core-dumping
>> probably simply because that's the only case people had realized, and
>> not thought of the execve() thing.
>
>
> I don't see how there can be a complete solution in may_ptrace_stop.
>
> a) We must stop in PTRACE_EVENT_EXIT during exec or userspace *breaks*.
>
> Those are the defined semantics and I believe it is something
> as common as strace that depends on them.
>
> b) Even if we added a test for our ptrace parent blocking in a ptrace
> attach of an ongoing exec, it still wouldn't help.
>
> That ptrace attach could legitimately come after the thread in
> question has stopped and notified it's parent it is stopped.
>
>
>
> None of this is to say I like the semantics of PTRACE_EVENT_EXIT. It is
> just we will violate the no regressions rule if we don't stop there
> during exec.
>
> The normal case is that the strace or whomever is already attached to
> all of the threads during exec and no deadlock occurs. So the current
> behavior is quite usable.
>
>
>
> Maybe my memory is wrong that userspace cares but I really don't think
> so.
>
>
> Eric
>