Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

From: Bernd Edlinger
Date: Thu Apr 09 2020 - 11:15:28 EST


On 4/9/20 4:58 PM, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
>> On Wed, Apr 8, 2020 at 8:17 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>>
>>> Yes. I missed the fact that we could take the lock killable.
>>> We still unfortunately have the deadlock with ptrace.
>>
>> That, I feel, is similarly trivial.
>>
>> Again, anybody who takes the lock for writing should just do so
>> killably. So you have three cases:
>>
>> - ptrace wins the race and gets the lock.
>>
>> Fine, the execve will wait until afterwards.
>>
>> - ptrace loses the race and is not a thread with execve.
>>
>> Fine, the execve() won, and the ptrace will wait until after execve.
>>
>> - ptrace loses the race and is a thread with execve.
>>
>> Fine, the execve() will kill the thing in dethread() and the ptrace
>> thread will release the lock and die.
>
> That would be nice.
>
> That is unfortunately not how ptrace_event(PTRACE_EVENT_EXIT, ...) works.
>
> When a thread going about it's ordinary business receives the SIGKILL
> from de_thread the thread changes course and finds it's way to do_exit.
> In do_exit the thread calls ptrace_event(PTRACE_EVENT_EXIT, ...) and
> blocks waiting for the tracer to let it continue.
>
> Further from previous attempts to fix this we know that there
> are pieces of userspace expect that stop to happen. So if the
> PTRACE_EVENT_EXIT stop does not happen userspace which is already
> attached breaks.
>
> Further this case with ptrace is something we know userspace
> does and is is just a matter of bad timing of attaching to the
> threads when one thread is exec'ing. So we don't even need to wonder if
> userspace would do such a silling thing.
>
>
>
> There are a lot similar cases that can happen if userspace inserts
> itself into the path of page faults, directly or indirectly,
> as long as some wait somewhere ultimately waits for a ptrace attach.
>
>

Remember, as a last resort there is my "insane" 15/16 patch, which
Linus admittedly hates, but it works. If we find a cleaner solution
it can always be reverted, that is just fine for me.

Thanks
Bernd.

>> So all three cases are fine, and none of them have any behavioral
>> differences (as mentioned, the killing is "invisible" to users since
>> it's fundamentally a race, and you can consider the kill to have
>> happened before the ptrace started).
>
> See above.
>
>
>>> It might be simpler to make whichever lock we are dealing with per
>>> task_struct instead of per signal_struct. Then we don't even have to
>>> think about what de_thread does or if the lock is taken killable.
>>
>> Well, yes, but I think the dethread behavior of killing threads is
>> required anyway, so..
>
> It is, but it is actually part of the problem.
>
> I think making some of this thread local might solve another easy case
> and let us focus more on the really hard problem.
>
>>> I keep wondering if we could do something similar to vfork. That is
>>> allocate an new task_struct and fully set it up for the post exec
>>> process, and then make it visible under tasklist_lock. Finally we could
>>> free the old process.
>>>
>>> That would appear as if everything happened atomically from
>>> the point of view of the rest of the kernel.
>>
>> I do think that would have been a lovely design originally, and would
>> avoid a lot of things. So "execve()" would basically look like an exit
>> and a thread creation with the same pid (without the SIGCHILD to the
>> parent, obviously)
>>
>> That would also naturally handle the "flush pending signals" etc issues.
>>
>> The fact that we created a whole new mm-struct ended up fixing a lot
>> of problems (even if it was painful to do). This might be similar.
>>
>> But it's not what we've ever done, and I do suspect you'd run into a
>> lot of odd small special cases if we were to try to do it now.
>
> I completely agree, which is why I haven't been rushing to do that.
> But this remains the only idea that I have thought of that would solve all
> of the issues.
>
>> So I think it's simpler to just start making the "cred lock waiters
>> have to be killable" rule. It's not like that's a very complex rule.
>
> I just looked at the remaining users of cred_guard_mutex and they are
> all killable or interruptible. Further all of the places that have been
> converted to use the exec_update_mutex are also all killable or
> interruptible.
>
> So where we came in is that we had the killable rule and that has what
> has allowed this to remain on the backburner for so long. At least you
> could kill the affected process from userspace. Unfortunately the
> deadlocks still happen.
>
> Eric
>