Re: Asynchrony (was Re: Locking a process or thread onto a specific CPU)

David Wragg (dpw@doc.ic.ac.uk)
22 Feb 1999 03:44:11 +0000


Jamie Lokier <lkd@tantalophile.demon.co.uk> writes:
> David Wragg wrote:
> [snip]
> > > The intention is (1) that a typical fast "schedule calling 50,000
> > > objects' `think()' methods" list+heap scheduler runs most of the objects
> > > efficiently (without individual stack frames); (2) some processing
> > > continues even while paging;
> >
> > Your sleeping-process-signals-another-process idea gives this, and is
> > easy to implement. (And possibly could be used for a nice
> > implementation of POSIX aio).
>
> It's not quite enough unfortunately. The problem of what to do when a
> blocked process unblocks is tricky, if you want performance as well as
> nice implementation.

Could you expand on this, please; I don't see the difficulty.

> [snip]
> From what I have seen, this means LinuxThreads performance has ways to
> go before its performance is good enough. I think (but I'm not certain
> yet), that the mechanisms we've been talking about can be used to get
> that performance and still have pthreads-like semantics. ]

Strictly speaking, it is true that a user-space threads context switch
can approach the cost (or rather cheapness) of a procedure call. I
think this is an important observation; it means that programs could
be structured as interacting processes, with huge numbers of these
processes, with little or no overhead compared to the traditional
procedural paradigm.

However, it is not a conclusive reason to think that a pthreads
implementation with many user-level threads per kernel tasks will have
great benefits for many programs. There are a number of reasons why
the implication is dubious:

o It uses a very limited scope when considering the cost of context
switching -- the cost of saving the registers of one
thread, then restoring those of the next. There are two other
important costs:

- The cost of scheduling decisions. There has to be a decision
about which thread to switch to.

- The cost of cache misses following a context switch. This is
highly application dependent, but can be the most significant
cost.

Kernel context switches suffer from these costs too, of course, but
the result is that the ratio of costs isn't quite as great as it
might seem.

o The last point applied to threading in general. For a pthreads
implementation, there is a lot of Unix cruft that has to be taken
care of on each switch: Signals, itimers, etc. (these can involve
system calls on rare occasions, but it's the cost of the various
checks that are really significant). The data structure has to be
relatively big, and each thread has a stack allocated. In other
words, user-space threads tend to become more like kernel tasks in
terms of resource usage, rather than minimal user-space threads. And
all of this is implemented in some form both in the kernel and in
user-space, both lots of code polluting the cache.

o User-thread context switches still have to be cheaper than
kernel-tasks switches. However, certain kinds of user-thread
switches will still involve trips into the kernel: Switches due to a
time-slice expiring, switches due to system calls that would
block. The only switches that don't involve the kernel are those due
to user-level synchronization mechanisms (mutexes, condition vars,
call them what you will). I am skeptical that typical pthreads
programs (including things like heavily threaded network servers)
have lots of threads blocking on user-level synchronization
mechanisms (though I must admit, I don't have hard evidence for
this). Of those pthreads programs that do, I suspect many of them
could be improved by simple changes such as splitting up locks.

Given these last two points, for programs which aren't concerned with
pthreads compatibility, and have threads blocking on user-level
synchronization mechanisms tens of thousands of times a second,
user-level threads packages can be a wonderful thing. Your simulation
application certainly falls into this category.

Suppose we had a many-user-threads-per-kernel-task pthreads
implementation which was very lean, so that its cost for typical
pthreads programs was minimal. The cheap(er) user-level
context-switches would bring great benefits that for those programs
that unavoidably have lots of threads waiting on user-level
synchronization mechanisms. On balance, such a pthreads implementation
would be preferable to a one-thread-one-task implementation such as
LinuxThreads.

But the only way to prove that such an implementation is possible is
by writing one, then taking measurements with real programs. Until
then, it is only possible to speculate. I am agnostic; the points
above indicate that it would be tricky, but not that it would be
impossible.

Dave Wragg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/