Re: RedHat 5.0 and threads

David Wragg (dpw@doc.ic.ac.uk)
Tue, 13 Jan 1998 02:23:22 GMT


Christopher Blizzard <blizzard@appliedtheory.com> writes:
>
> [snip]
>
> However, some threading performance ( note: not context switches ) numbers
> that I have seen have actually rated Linux the lowest behind solaris,
> digital unix and Win32. The reason for this ( I think ) is the way that
> pthreads is implemented. Linuxthreads uses signals with a manager thread
> to do much of the thread managment and I'm speculating that that may be
> part of the problem.

It depends what you measure. From my measurements on my single PPro
Linux machine, kernel 2.1.7?, the main performance problem
with LinuxThreads is thread creation. A clone() takes around 2500
cycles. pthread_create() takes 35500 cycles. The difference must be
due to synchronisation and context switches between the creating
thread and the manager thread.

(I've never run the same or similar tests on any other OS, so I don't
know what the mainstream-OS competition is like)

LinuxThreads coulde be modified so that the creating thread does the
clone() directly, rather than requesting this from the manager, though
there would be added complexity.

>
> This is also a problem with the current pthreads implementation...it uses
> SIGUSR1 and SIGUSR2 to communicate between the manager thread and the
> child threads. This may seem fine but there are applications out there
> that use those signals, an example being Sun's JDK. I looked into doing a
> native port but had to throw it out the window because of that.
>

Why not implement Java threads using the basic kernel threading
facilities (clone for thread creation and signals for synchronisation)
rather than going via LinuxThreads? Most of the complexity of
LinuxThreads is to do with implementing the details of POSIX threads,
but this is not directly needed for Java threads. Also, for garbage
collection to work properly you need more facilities than pthreads
provides for garbage collection, so you would need to work around
LinuxThreads anyway (possibly creating dependencies on the current
LinuxThreads implementation) - there was a thread relating to this in
comp.os.linux.development.system about a month ago.

Regarding SIGUSR1 and SIGUSR2: in the LinuxThreads FAQ it says:

In the meantime, you can try to use kernel-reserved signals either
in your program or in LinuxThreads. For instance,
SIGSTKFLT and SIGUNUSED appear to be
unused in the current Linux kernels for the Intel x86 architecture.
To use these in LinuxThreads, the only file you need to change
is internals.h, more specifically the two lines:

#define PTHREAD_SIG_RESTART SIGUSR1
#define PTHREAD_SIG_CANCEL SIGUSR2

Replace them by e.g.

#define PTHREAD_SIG_RESTART SIGSTKFLT
#define PTHREAD_SIG_CANCEL SIGUNUSED

Warning: you're doing this at your own risks.

Also, it seems that glibc2 has (by default) support for 1024 signals,
and 2.1 kernels have (by default) support for 64 signals. I have no
idea if these actually match up to provide >32 signal numbers.

> Some thread support in the kernel would help alleviate these problems. I
> realize there's an argument that threads belong in userland and the kernel
> should just provide the methods for creating new threads of execution
> which seems to be the way that things are done right now. I think that
> this is causing a lot of problems though.

With the exception of support for debugging, the kernel's support for
threads is already good. The problems are with implementations of
specific thread APIs, though for typical pthreads applications (of
which language implementation is certainly not one) LinuxThreads is
very good too.

> I would centend, only from speculation, that moving some things out of
> userland and into the kernel would actually help. Right now with the use
> of a manager thread with signals has to be really expensive ( how
> expensive is signal delivery vs. a regular context switch anyway? someone
> with more experience should comment on this ).

Here, context switches are around 2000 cycles. Two (SCHED_OTHER,
equal-priority) threads synchronising using signals can be around
2600 cycles:

thrA thrB

| .
kill(thrB, SIG) ---------- .
| ^ .
enter sigsuspend() --------- .
. ^ .
. 2600 cycles .
. 1350 cycles! .
.
v v .
-------- --------- sigsuspend() exits
|
|
|
V

> Moving simple thread
> constructs into the kernel like semaphores, mutexes and reader-writer
> locks would really help make things more elegant. Also, race
> conditions especially on multi-processor machines would probably be less
> of a problem.

Moving these things into the kernel just shifts any difficulties onto
kernel developers, and IMHO performance would not be improved.

Mutexes should certainly not be in the kernel; in a well-designed
program, contention for any particular mutex will be very low, so
locking or unlocking a mutex takes fewer cycles than any system call
would (<140 cycles to lock then unlock a LinuxThreads mutex)

>
> Also, if there was a ptrace or /proc interface, adding the needed features
> to the debugger would be a lot easier as well. You could stop all
> threads, start threads, list the state of all your mutexes and semaphores
> all through one interface. This is something that the debugger really
> needs and I would imagine that it's tougher to add this to a userland
> threads implementation than a kernel based one.

Everyone knows that kernel needs support for debugging threads. And a
'ps' which doesn't show every single thread.

Threads are already kernel-based. If a debugger wants to see the state
of a mutex, it can just look at the mutex data-structure in user-space
(a nice scriptable, extensible debugger could be told how to do such
things).

If each mutex had some data in the kernel, so they could be reported
via /proc, the kernel memory requirements would get silly.

>
> Of course all of this is just speculation on my part. I'm not a kernel
> hacker....yet. :) Getting a dialogue running about this is what I'm
> really intereted in. I would like to see what people who have more
> experience dealing with implementing these kinds of systems have to say
> about the performance and design problems that come from what I'm
> suggesting here.

Except for the lack of debugging and the ps thing, kernel threads are
generally fine right now. And if you're not too fussed about the more
fiddly details of POSIX threads, and your application doesn't spend
most of its time in thread creation, then LinuxThreads is great too.

(I have the code to generate all the timings given here, but it's
horrible stuff. If anyone else would like to see it, mail me,
and I'll make it readable and useable and distribute it.)

Dave Wragg