I ask again - why do you need to do this? Is there any other reason than
the cost of a thread context switch vs the cost of process context switch?
If the answer is no and process context switch approximates thread context
switch, then the whole necessity for the complicated two level model goes
away.
: >It's all a crock of doo doo that was deemed to necessary because
: >"context switches cost too much".
:
: well, context switches are painful as is any kernel crossing in high
: performance computing. imagine user level networking on high speed
: connections that can have round trip times in the ~50us range (this is
: a software implementation in our lab, SGI's GSN is committed to round
: trip times of around 7us roundtrip hardware latency), if you
I've (a) spent a great deal of time thinking about this very issue, and
(b) worked on GSN at SGI, and (c) am under contract with LLNL working
on exactly this issue, amongst others. I'm pretty in tune with the
problem space and I don't see that it has any bearing on the discussion
at all. If you are going to context switch for each packet, you can
kiss your performance good bye whether you are context switching threads
or processes. Neither are fast enough to hit the needed 10 usec round
trip time that all the HPC folks like LLNL want.
: you could say that going to an event loop programming model would be
: much more appropriate for this system, but if you can get a thread
: model to handle it this type of load, then do it.
Agreed with the first part, couldn't agree with the second part - it ain't
happening - the context switches will be kernel level context switches
whether they are "threads" or "processes" since the event generated
is a kernel level event. Yeah, you can deliver the packet into user
space directly, but have fun getting the kernel to tell your user level
scheduler to run a new thread. Sure it can be done, and has been done,
but an old quote of mine is "Architect: someone who knows the difference
between what could be done and what should be done". My architect hat
says this is not "a should be done", your view may be differen.
: so, i am really confused by this with the sample program attached below
: and run against the MIT pthreads user level package and the glibc
: linuxthreads implementation shows a bit of a bigger difference than you
: suggest over a 2.2.9 kernel.
That's a threads package problem. Our numbers agree nicely on the 3.5
usecs time - I get 1.75 usecs for a 2 process context switch plus about
2.7 usecs for overhead of passing a word through pipes. So if the glibc
is getting 20 usecs or so, they are busted. But that has nothing to do
with our discussion here: my claim is that if the process performance is
good enough, the whole need for threads as a concept goes a way - a thread
is just a different set of attributes on the process, i.e., shared VM,
signals, PWD, whatever. And I'm assuming that you are happy with the
3 usecs number and that's about the same as the process number. So where's
the need for threads?
: the cache miss domination argument doesn't seem to make alot of sense
: unless you want to believe you are flushing the whole cache which you
: aren't because you are sharing the VM space between the threads.
Caches aren't infinite in size. Yeah, it's true that benchmarks fit
nicely in the L1 cache so we see these nice 1-3 usec context switch
numbers. But those numbers go up by a factor of ~100 when you add 32KB
worth of data | instructions to the work load performed by each thread.
Your threads do do something, right? So if you have N threads with a
cache working set of C bytes, then you will start generating cache misses
somewhere around N * C sized L1 caches. As soon as you fall out of the
L1 cache, the numbers start to get dominated by the cache miss time.
If you look at the graphs you can get from lmbench, you can see the
effects of cache sizes on context switches. I measure context switch
time as a function of number of processes and process working set, and
plot the results. yeah, the 2 process, 0 size numbers are ~1 usec, but
the 2 process 32K sized ones are 22 usecs, the 8 process 32K ones are
100 usecs, etc. So once again, people can misuse benchmarks to try and
make their point but I think we are trying to get at the truth here, not
at better benchmark numbers. And the truth is that context switch times
are _not_ represented by the 2 process, 0 size case. That's a benchmark.
In the real world, you switch to that context to do something and that
is going to have a cost that can exceed the context switch by multiple
orders of magnitude.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/