Re: [GIT PULL] Please pull proc and exec work for 5.7-rc1

From: Linus Torvalds
Date: Thu Apr 09 2020 - 20:31:05 EST


On Thu, Apr 9, 2020 at 2:17 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> So I'm not just good at "spotting odd corner cases". I told you why
> that bogus deadlock of yours failed - the execve was pointlessly
> waiting for a dead thread that had marked itself ptraced, and nobody
> was reaping it.

Side note: this does seem to be the only special case at least in this
particular area for execve().

The thing is, for actual _live_ threads that end up waiting for
ptracers, the SIGKILL that zap_other_threads() does should punch
through everything - it should not just break out anybody waiting for
the mutex, it should also break any "wait for tracer" ptrace waiting
conditions.

But already dead threads are special. Unlike SIGKILL'ed other threads,
they can stay around almost forever.

Because they're in Zombie state, and no amount of SIGKILL will do
anything to them (and zap_other_threads() didn't even try), and
they'll stay around until they are reaped by some entirely independent
party (which just happened to be the same one that was doing the
ptrace, but doesn't really have to be)

(Of course, who knows what other special cases there might be - I'm
not saying this is the _only_ special case, but in the particular area
of 'zap other threads and wait for them', already dead threads do seem
to be special).

So the fact that "zap_threads()" counts dead threads - but cannot do
anything about them - is fundamentally different, and that's why that
particular test-case has that odd behavior.

So I think we have basically three choices:

(1) have execve() not wait for dead threads while holding the cred
mutex (that's effectively what that zap_threads() hacky patch does,
but it's not really correct because it can cause notify_count
underflows)

(2) have zap_other_threads() force-reap Zombie threads

(3) say that it's a user space bug, and if you're doing PTRACE_ATTACH
you need to make sure there are no dead threads of the target that
might be blocking an execve().

For an example of that (3) approach: making the test-case just do a
waitpid() before the PTRACE_ATTACH will unhang it, because it reaps
that thread that did the PTRACE_TRACEME.

So option (3) is basically saying "that test-case is buggy, exactly
like the two readers on a pipe".

But I continued to look at (1), but trying to deal with the fact that
"notify_count" will get decremented not just by live threads, but by
the ones that already died too.

Instead of trying to change how notify_count gets decremented, we
could do something like the attached patch: wait for it to go down to
zero, yes, but go back and re-check until you don't have to wait any
more. That should fix the underflow situation. The comment says it
all.

This patch is 100% untested. It probably compiles, that's all I'll
say. I'm not proud of it. And I may be missing some important thing
(and I'm not happy about the magical "am I not the
thread_group_leader?" special case etc).

It's worth noting that this code is the only one that cares about the
return value of zap_other_threads(), so changing the semantics to only
count non-dead threads is safe in that sense.

Whether it's safe to then share the signal structure ever the rest of
exevbe() - even if it's only with dead threads - I didn't check.

I think Oleg might be the only person alive who understands all of our
process exit code.

Oleg? Comments?

Linus