In the heavy-duty computation case on multiprocessors, you are
right in the ideal case. In the less than ideal case (like the
application I am working on under Solaris right now) there is
a significant amount on locking of shared data structures involved.
If the multiple threads were completely free-running, kernel scheduling
would be just fine. When the amount of mutex contention gets to a certain
point, you end up with threads that have to block.
Every time you have to block on a mutex you are doing a context switch.
One way to lighten this load is using the N to M model that Solaris uses.
I use a few more user threads than kernel threads (LWP in Solaris
terminology). Then, when a thread blocks on a mutex because another
thread has it locked already, the user mode thread scheduler can
usually find another thread capable of getting some work done without
resorting to a system call.
Dan McCoy Pixar mccoy@pixar.com