Re: Default cache_hot_time value back to 10ms

From: Peter Williams
Date: Wed Oct 06 2004 - 17:57:56 EST


Chen, Kenneth W wrote:
Chen, Kenneth W wrote on Tuesday, October 05, 2004 10:31 AM

We have experimented with similar thing, via bumping up sd->cache_hot_time to
a very large number, like 1 sec. What we measured was a equally low throughput.
But that was because of not enough load balancing.

Since we are talking about load balancing, we decided to measure various
value for cache_hot_time variable to see how it affects app performance. We
first establish baseline number with vanilla base kernel (default at 2.5ms),
then sweep that variable up to 1000ms. All of the experiments are done with
Ingo's patch posted earlier. Here are the result (test environment is 4-way
SMP machine, 32 GB memory, 500 disks running industry standard db transaction
processing workload):

cache_hot_time | workload throughput
--------------------------------------
2.5ms - 100.0 (0% idle)
5ms - 106.0 (0% idle)
10ms - 112.5 (1% idle)
15ms - 111.6 (3% idle)
25ms - 111.1 (5% idle)
250ms - 105.6 (7% idle)
1000ms - 105.4 (7% idle)

Clearly the default value for SMP has the worst application throughput (12%
below peak performance). When set too low, kernel is too aggressive on load
balancing and we are still seeing cache thrashing despite the perf fix.
However, If set too high, kernel gets too conservative and not doing enough
load balance.


Ingo Molnar wrote on Wednesday, October 06, 2004 12:48 AM

could you please try the test in 1 msec increments around 10 msec? It
would be very nice to find a good formula and the 5 msec steps are too
coarse. I think it would be nice to test 7,9,11,13 msecs first, and then
the remaining 1 msec slots around the new maximum. (assuming the
workload measurement is stable.)


I should've post the whole thing yesterday, we had measurement of 7.5 and
12.5 ms. Here is the result (repeating 5, 10, 15 for easy reading).

5 ms 106.0
7.5 ms 110.3
10 ms 112.5
12.5 ms 112.0
15 ms 111.6



This value was default to 10ms before domain scheduler, why does domain
scheduler need to change it to 2.5ms? And on what bases does that decision
take place? We are proposing change that number back to 10ms.

agreed. What value does cache_decay_ticks have on your box?



I see all the fancy calculation with cache_decay_ticks on x86, but nobody
actually uses it in the domain scheduler. Anyway, my box has that value
hard coded to 10ms (ia64).


If you fit a quadratic equation to this data, take the first derivative and then solve for zero it will give the cache_hot_time that maximizes the throughput.

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/