Re: SCO: "thread creation is about a thousand times faster than on

From: Linus Torvalds (torvalds@transmeta.com)
Date: Mon Aug 28 2000 - 01:20:27 EST


In article <20000827220921.E65631@sfgoth.com>,
Mitchell Blank Jr <mitch@sfgoth.com> wrote:
>Linus Torvalds wrote:
>
>> Now, if the question is "Ok, I'm a POSIX thread, and I want the stupid
>> POSIX behaviour where an execve() kills all other threads", then you
>> have to do it by hand. Something like "kill all other threads, and then
>> do the execve". It's not rocket science.
>
>Yes, but:
> * If we're going to give POSIX threads the illusion of a shared PID then
> to be strictly correct the exec'ed process needs to have that PID too.
> If we want to do this, then an execve from any random thread won't
> do.

Ahh. But this is exactly where I think your analogy is broken.

You need the illusion of a shared pid for _threads_. But we've already
agreed that a thread that does an execve() breaks apart from the other
threads, if for no other reason than the fact that it obviously doesn't
share the same VM. So we have a phase transition..

More importantly, once you approach the problem from the clone/rfork
approach, you understand that threads and processes are really the same
thing on a fundamental level. So you realize that the notion of a
"process ID" as separate from a "thread ID" is not a sensible notion. A
thread is a process is a thread, and it has a well-defined ID. It's not
"thread or process". It's both.

So rather, you could have the notion of a _grouping_ of these
threads/processes. UNIX had that notion long before threads came to be.
Sessions and process groups, with process group leadership and all that
entails. A grouping of lightweight processes isn't that different, now
is it? It's not the _same_ group, but it's the same concept.

So we realize that the whole notion of a "shared PID" is a broken
notion. It's not really a shared pid AT ALL, because obviously the
different lightweight processes have identities of their own. It's
purely a _grouping_ of identities, and as we already saw, execve()
already breaks the grouping by virtue of obviously breaking the VM
sharing.

So it's entirely and utterly internally consistent to just say that a
process breaks away from the thread grouping when it does an execve().
And it's entirely and utterly consistent to say that the _traditional_
"shared thread ID" is really nothing more than the group ID.

Very logical. Internally consistent. And not very confusing at all.

But the above only gives a "feeling" for what the right answer might be.

Now, if you also take into account how process group leadership in UNIX
has always worked (well, "always" obviously means since process groups
were introduced, which is however long ago), you also see the details.
It becomes clear that the thread leader pid is the one that becomes the
"thread group ID".

Which means that when "execve()" breaks apart into its own "group of
one", what really happens is that it suddenly gets its own unique thread
group ID - which is obviously the same as the unique pid it got when it
was created.

How do we reconcile this with "execve() doesn't change the pid"?

We don't. We don't have to. You'll see that the historical UNIX
behaviour wrt execve() not changing the process pid is just a special
case of the above: it's the special case of the thread group leader
doing the execve() - because each traditional "process" is nothing but
its own thread group leader (a rather lonely leader, admittedly, as
there's nobody else to actually lead ;).

Final piece of the puzzle: we use the (n+1) native threads approach to
do the n pthreads thing, so that none of the threads that a threaded
pthreads program will see is actually the process group leader (that
single thread is the hidden administrator - the puppet master - created
at the first thread creation).

That way pthreads execve() will always reliably break the connection,
because the execve() will never be the historical special case of
"thread group == pid". Which is what you want to have, because you've
already broken the symmetry (and execve was so badly defined in pthreads
that changing that detail shouldn't be a big issue).

Well-defined? Yes. Surprising? I don't think so. I think it follows
fairly naturally from just two assumptions: "threads and processes are
fundamentally the same thing" and "we want to be close to 'historical
UNIX' in spirit".

[ Alternative details: regular UNIX process group leadership also
  implies the notion of "orphaned" processes without a group leader.
  I'd rather not get into that when it comes to thread groups, but
  there's nothing wrong with the notion per se. I just don't think the
  need is there, like it is for process group leadership. But this
  issue is more of "a detail in the implementation" than a big design
  question. I don't care all that much either way, myself. ]

> * What happens if the execve fails? I suspect POSIX would want us to
> return an error code to that thread without interfering with the
> existing threads. But wait, we've already "kill -9"'ed them all.
> To get this right would definately require kernel support. The
> kernel must atomically end all the other threads when it knows the
> execve isn't going to return an error.

Actually, I'd personally just prefer trying to find a loophole in POSIX.
Or do it in the loader process, so that the newly executed process
automatically kills any remnant threads before doing anything else. I
agree that it's a nasty thing to emulate - but it's also one of the
worst horrors of the pthreads process notion, and potentially a good
candidate for just saying "ok, you'd better learn the new rules".

Or you could have the master kernel thread and add some simple kernel
extensions - we already have "child died" signals and "parent died"
signals, we could have a "child executed" signal too and just be done
with it: the puppet master thread would get the "child executed"
notification and would just kill off all its threads.

Or you could extend a bit on the logic we already have for "vfork()",
which has some of the same notification issues (ie the vfork() parent
needs to be aware of when its child has released the VM space back to
it).

So there are tons of solutions to this kind of backwards compatibility
that don't require the actual braindead semantics of pthreads - but they
may require some creativity.

But no question about it: some of the pthreads semantics are not pretty
in a truly threaded environment. pthreads does tend to assume a fairly
strictly ordered life (which is natural considering the origins of it,
but it makes it painful when you have truly independent threads that
aren't all that strongly synchronized).

                        Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 31 2000 - 21:00:20 EST