Re: Linux threads -- as seen in NT Magazine

Colin Plumb (colin@nyx.net)
Wed, 9 Dec 1998 21:07:37 -0700 (MST)


Richard Fish wrote:
> The author (Mark Russinovich) and I had a bit of an email exchange
> yesterday regarding this article.

> He is actually much more knowledgable about the state of threading
> support in current kernel than I expected. He is aware of clone, and
> the threads implementation in glibc2. He was NOT aware that many(most?)
> of the other system libraries (such as Xlib) can be compiled to be
> thread-safe, and are distributed that way by some distributions.

Me too. And I got Alan's opinions on things too. So perhaps I should
repeat them here to save Alan the trouble.

On the points raised:
< 1 - IO operations are still serialized at a very high level in the
< kernel. For example, it does not seem possible to read from a disk file
< at the same time as a write() is being processed to a socket. Is this
< correct, or did we miss some magic in the (un)lock-kernel() calls?

> Mark Russinovitch wrote:
>> No, I'm referring to 2.1.131. There are some paths that use finer-grained
>> locks, but the major ones (e.g. read, write) do not and still are
>> non-reentrant. Those are the most critical, I think you'll agree.

And Alan replied:
> Look harder. The critical paths in time terms are the physical I/O layer
> which is threaded and the irq layer (which is doing the back end of the
> physical I/O). IRQ's are distributed across all processors and load shared
> without major lock interaction. The VFS drops the locks every point it
> waits for I/O. So its either very fast or asleep. And in the asleep case
> its not enforcing threading.

Yes, sys_read() does lock_kernel() first thing. It then
checks the page cahce for data, calls the file system, and
blocks on I/O. Dropping the lock.

It hasn't been fixed because it's not a bottleneck. Not even on
a 14-processor SPARC.

< 2 - Since the kernel does not have any notion of threads, it cannot
< optimize things such as memory management. To quote Mr. Russinovich
< directly:

<< I'm referring to the OS correct management of the physical memory
<< that has been assigned to the set of clones - it will trim/expand
<< the memory under the assumption that each is separate. This will
<< not cause incorrect behavior, but may introduce inefficiency and
<< is further evidence that the thread support has only begun.

< Since I know nothing about memory management algorithms, I cannot say
< what, if any, effeciency gains there could be by having a notion of
< threads...others are invited to speak. :)

Again, to quote Alan first,

> Wrong. The memory management is driven by access rates to page histories. It
> automatically adjusts to reflect shared page usage by threads therefore. The
> rest of the OS really doesn't need to care. A process is a thread is a
> process. If I create two threads or two processes the top level scheduling
> is the same and should be.

As far as I can tell, the memory managemant doesn't know about *processes*
even. It just knows address spaces. Why should it care what's using them?
It just cares *how* ithey're being used.

< 3 - Not directly related to threading support, but more of a "supporting
< enterprise-level applications" issue: The current kernel does not
< support asynchronus IO. E.g, there is no mechanism to issue write()s on
< several file descriptors and get notified of the status of those
< write()s at a later time.
<
< The main goal of the async IO (at least, as far as this discussion is
< concerned) is to avoid having a separate thread for each connected
< client in a server process. One thread-per-CPU is the most effecient
< state for a server process, because it avoids context switches. Async
< IO would allow each thread to service multiple clients.
<
< My response to the above is that similar functionality could be gained
< with a combination of select() and non-blocking IO. However, this
< combination would seem to generate excessive context switches and
< user<->kernel space transfers, and non-blocking IO is not particularly
< useful for disk files. It does have the advantage of getting immediate
< error reporting however...
<
< So, are context switches are so effecient that the one-thread-per-client
< model is the best way to go, and there would be no real performance gain
< in implementing async IO? If not, perhaps this should be a goal of the
< 2.3 series...
>

Again, Alan first:

> 2.1.x has completion ports, of course its done in a clever and clean way
> (POSIX rt signalling) and sigwaitinfo(). You are correct that select() is
> in general bad for scaling. However modern work (Banga et al) has it
> basically outperforming the completion port models for very large task
> sets.

> All network I/O is asynchronously buffered. There is no need for asynchronous
> I/O in kernel space. glibc 2.1 provides the full POSIX real time asynchronous
> I/O API entirely in user space built on clone(). Thats another chunk of code
> we don't have to put in the OS kernel for no actual performance change.

(For anyone who cares, the "Banga" references can be found at
http://www.cs.rice.edu/~gaurav/papers/index.htm)

Anyway, async I/O exists, and if there are serious performance problems,
the kernel will be tuned or enhanced. (You think this bunch of speed
freaks can resist?)

-- 
	-Colin

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/