RE: Industry db benchmark result on recent 2.6 kernels

From: Chen, Kenneth W
Date: Thu Mar 31 2005 - 17:20:49 EST


Linus Torvalds wrote on Thursday, March 31, 2005 12:09 PM
> Btw, I realize that you can't give good oprofiles for the user-mode
> components, but a kernel profile with even just single "time spent in user
> mode" datapoint would be good, since a kernel scheduling problem might
> just make caches work worse, and so the biggest negative might be visible
> in the amount of time we spend in user mode due to more cache misses..

I was going to bring it up in another thread. Since you brought it up, I
will ride it along.

The low point in 2.6.11 could very well be the change in the scheduler.
It does too many load balancing in the wake up path and possibly made a
lot of unwise decision. For example, in try_to_wake_up(), it will try
SD_WAKE_AFFINE for task that is not hot. By not hot, it looks at when it
was last ran and compare to a constant sd->cache_hot_time. The problem
is this cache_hot_time is fixed for the entire universe, whether it is a
little celeron processor with 128KB of cache or a sever class Itanium2
processor with 9MB L3 cache. This one size fit all isn't really working
at all.

We had experimented that parameter earlier and found it was one of the major
source of low point in 2.6.8. I debated the issue on LKML about 4 month
ago and finally everyone agreed to make that parameter a boot time param.
The change made into bk tree for 2.6.9 release, but somehow it got ripped
right out 2 days after it went in. I suspect 2.6.11 is a replay of 2.6.8
for the regression in the scheduler. We are running experiment to confirm
this theory.

That actually brings up more thoughts: what about all other sched parameters?
We found values other than the default helps to push performance up, but it
is probably not acceptable to pick a default number from a db benchmark.
Kernel needs either a dynamic closed feedback loop to adapt to the workload
or some runtime tunables to control them. Though the latter option did not
go anywhere in the past.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/