That's not a Linux problem, that's your benchmark design. The lmbench
ctx switch test case varies about 5% when I'm wildly moving the mouse
in X windows and running the benchmark in a while true; loop. You are
getting more than a 100% variance in your benchmark. What possible
valid conclusion could you draw from those results?
[ comments about the variance being due to cache conflicts deleted ]
There is no possible way that you would get 100% variance due to cache
misses for this sort of test. (a) you are just calling sched_yield() -
there is virtually nothing in the cache footprint - where's the source of
the cache conflicts? (b) I'm sitting here running a 16 process context
switch test over and over and I'm seeing about a 4-6% variance from run
to run. How come I'm not seeing your variance?
[ stuff about using the minimum deleted ]
: OK, now lets look at the minimum latency when running with 12 extra
: processes: I get 38.8 us, 32.9 us an 43.5 us and 38.9 us on successive
: runs.
: So it's fair to say that the time goes from 6.2 us (2 processes on the
: run queue) to 32.9 us (14 processes on the run queue).
No, it is absolutely not fair to say that. Pretend you are reviewing
a paper and somebody submitted a paper that said "I'm not really sure
what this is doing, my results vared too much, I'm not sure why, so I
took just the mins because that didn't vary as much, and I think that
looking at the mins means XYZ". What would your review say?
: I've looked at your code and it does similar things to mine (when I
: use mine in -pipe mode). Your benchmark has the added overhead of the
: pipe code. Using sched_yield() gets me closer to the true scheduling
: overhead.
You didn't look close enough - it carefully factors out everything except
the context switch - and "everything" includes the pipe overhead.
------------------
All that said, I'm not saying you haven't stumbled onto a problem.
You may have and you may not have, we just can't tell from your test.
I can say, however, that your claims of the runq being a problem are
way overblown. My tests show a pretty smooth linear increase of 333ns
per extra background process on a 166Mhz pentium. I think you claimed
that just having 2 or 3 background processes would double the context
switch time on a similar machine: that certainly isn't even close to true.
I kept piling them on and finally doubled the two process case with 24
background processes (went from around 5 usecs to 11).
Given that I've not seen a production workload generate a run queue depth
of 24 in the last 10 years, I'm just not convinced that this is a problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/