The following is a sumamry of the responses to my original message
concerning Linux's Kernel Threads.
1) Kernel Threads Are Usable, But Have Serious Problems
Thanks to Randy Chapman and a sleppy sct who I happened to corner on
IRC one night. They pointed out that while kernel threads are quite
useable, there are lots of unresolved problems which keep them from
being used for serious applications. The biggest problem being that
threads are likely to become confused when page tables are modified --
especually when pages are removed via munmap. Concurrent calls to
mmap() or a call to munmap() could cause the paging tables to become
muddled and/or memory areas might 'disappear' out from under system
calls which eventually will cause a kernel oops. Bet its nothing that
a little more locking, some extra calls to verify_area(), and some
extra error checking couldn't cure in a future release.
Swapping is 100% thread safe I am told.
2) User-mode locking
The kernel does not provide any locking functions to help threads
syncronise and not step all over each other. Apparently a locking
mechanism could be hacked out of the IPC functions, but that is
considered to be quite a dirty approach. Someone posted some code to
the linux-smp list not long ago which implemented a spin lock in user
mode by making use of the atomic compare and exchange instruction on
the iX86. It is as atomic as can be, and is guaranteed
FIFO. Unfortunately, noone ever filled in the ASM portion of the code,
so if you come acress it in the archive, be prepaired to do a little
ASM. I'm told that Postgres95 uses the same technique, so you might be
able to snag the asm stuff from that.
3) Advantages of Kernel Threads
In addition to being able to program in a leniar model without hacing
to hack together event calls-backs and other fun stuff, Kernel threads
will take advantage of multiple processors (on SMP kernels), threads
can contine to execute even if one thread is blocked on disk IO or
paging.
4) Lib Support
Lookign back to #1, you should disable the new mmap() based malloc
when useing libc with threads. Although there is support for pthreads
in libc, it may not be horribly useful because 1) locking is
implemented as turning off teh pthreads scheduler and 2) hju those to
implement threadsafe functions using the same technique of pthreads
did (locking) -- something that most people seem to be highly critical
of and consider to be a dumb mistake. In short, you mill probably have
to write your own threadsafe versions of everything that depends on
global varibles. The consenseous is that the best way to implement
them is to pass a pointer to user-allocated memory for structures
normally returned in global varibles.
5) Malloc
Everyone agreed that malloc *MUST* use locking. There are steps that
can be taken to limit the time and scope of the memory pool that gets
locked, but there must be locking at some point. Calls to mmap() are
nearly thread-safe or thread-safe. Calls to mremap() are
threadsafe. Calls to munmap are NOT threadsafe. Calls to sbrk() ARE
threadsafe. Two threads calling a m*map() function in the same process
may have an undefined result.
Last but not least.. by popular demand.. linus's clone() example:
---#include <signal.h> #include <stdio.h> #include <stdlib.h> #include <fcntl.h>
#include <linux/unistd.h>
#define STACKSIZE 16384
#define CSIGNAL 0x000000ff /* signal mask to be sent at exit */ #define CLONE_VM 0x00000100 /* set if VM shared between processes */ #define CLONE_FS 0x00000200 /* set if fs info shared between processes */ #define CLONE_FILES 0x00000400 /* set if open files shared between processes */ #define CLONE_SIGHAND 0x00000800 /* set if signal handlers shared */
/* * This is a "kind-of" thr_create() as in pthreads, but not really. * It needs some fleshing out to work like pthreads thr_create(). */ int start_thread(void (*fn)(void *), void *data) { long retval; void **newstack;
/* * allocate new stack for subthread */ newstack = (void **) malloc(STACKSIZE); if (!newstack) return -1;
/* * Set up the stack for child function, put the (void *) * argument on the stack. */ newstack = (void **) (STACKSIZE + (char *) newstack); *--newstack = data;
/* * Do clone() system call. We need to do the low-level stuff * entirely in assembly as we're returning with a different * stack in the child process and we couldn't otherwise guarantee * that the program doesn't use the old stack incorrectly. * * Parameters to clone() system call: * %eax - __NR_clone, clone system call number * %ebx - clone_flags, bitmap of cloned data * %ecx - new stack pointer for cloned child * * In this example %ebx is CLONE_VM | CLONE_FS | CLONE_FILES | * CLONE_SIGHAND which shares as much as possible between parent * and child. (We or in the signal to be sent on child termination * into clone_flags: SIGCHLD makes the cloned process work like * a "normal" unix child process) * * The clone() system call returns (in %eax) the pid of the newly * cloned process to the parent, and 0 to the cloned process. If * an error occurs, the return value will be the negative errno. * * In the child process, we will do a "jsr" to the requested function * and then do a "exit()" system call which will terminate the child. */ __asm__ __volatile__( "int $0x80\n\t" /* Linux/i386 system call */ "testl %0,%0\n\t" /* check return value */ "jne 1f\n\t" /* jump if parent */ "call *%3\n\t" /* start subthread function */ "movl %2,%0\n\t" "int $0x80\n" /* exit system call: exit subthread */ "1:\t" :"=a" (retval) :"0" (__NR_clone),"i" (__NR_exit), "r" (fn), "b" (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD), "c" (newstack));
if (retval < 0) { errno = -retval; retval = -1; } return retval; }
int show_same_vm;
void cloned_process_starts_here(void * data) { printf("child:\t got argument %d as fd\n", (int) data); show_same_vm = 5; printf("child:\t vm = %d\n", show_same_vm); close((int) data); }
int main() { int fd, pid;
fd = open("/dev/null", O_RDONLY); if (fd < 0) { perror("/dev/null"); exit(1); } printf("mother:\t fd = %d\n", fd);
show_same_vm = 10; printf("mother:\t vm = %d\n", show_same_vm);
pid = start_thread(cloned_process_starts_here, (void *) fd); if (pid < 0) { perror("start_thread"); exit(1); }
sleep(1); printf("mother:\t vm = %d\n", show_same_vm); if (write(fd, "c", 1) < 0) printf("mother:\t child closed our file descriptor\n"); }