Hello Pratik,
On Sat, Feb 22, 2020 at 12:30:01PM +0530, Pratik Rajesh Sampat wrote:
Currently the TEO governor apart from the TEO timer and hit/miss/earlyCould you clarify what this means ? Do you mean that when state 1 is
hit buckets; also gathers history of 8 intervals and if there are
significant idle durations less than the current, then it decides if a
shallower state must be chosen.
The current sliding history window does do a fair job at prediction,
however, the hard-coded window can be a limiting factor for an accurate
prediction and having the window size increase can also linearly affect
both space and time complexity of the prediction.
To complement the current moving window history, an approach is devised
where each idle state separately maintains a weight for itself and its
counterpart idle states to form a probability distribution.
When a decision needs to be made, the TEO governor selects an idle state
based on its timer and other hits/early hits metric. After which, the
probability distribution of that selected idle state is looked at which
gives insight into how probable that state is to occur if picked.
The probability distribution is nothing but a n*n matrix, where
n = drv->state_count.
Each entry in the array signifies a weight for that row.
The weights can vary from the range [0-10000].
For example:
state_mat[1][2] = 3000 means that previously when state 1 was selected,
the probability that state 2 will occur is 30%.
selected, the probability that the CPU will be in state 1 for the
duration corresponding to state 2's residency is 30% ?
Further more, this means that during idle state selection we have O(n)
complexity if n is the number of idle states, since we want to select
a state where we are more likely to reside ?
The trailing zeros correspond to having more resolution while increasingSo we only update the weight in just one cell ?
or reducing the weights for correction.
Currently, for selection of an idle state based on probabilities, a
weighted random number generator is used to choose one of the idle
states. Naturally, the states with higher weights are more likely to be
chosen.
On wakeup, the weights are updated. The state with which it should have
woken up with (could be the hit / miss / early hit state) is increased
in weight by the "LEARNING_RATE" % and the rest of the states for that
index are reduced by the same factor.
To use the example above, if we selected state 1, and we resided in it
for a duration corresponding to state 2's residency, we will only
update state_mat[1][2] ?
I haven't been able to observe any adverse effects of the statelessnessThe advantage of this approach is that unlimited history of idle statesCan the effect of this staleless be observed ? For instance, if we
can be maintained in constant overhead, which can help in more accurate
prediction for choosing idle states.
The advantage of unlimited history can become a possible disadvantage as
the lifetime history for that thread may make the weights stale and
influence the choosing of idle states which may not be relevant
anymore.
have a particular idle entry/exit pattern for a very long duration,
say a few 10s of minutes and then the idle entry/exit pattern changes,
how bad will the weighted approach be compared to the current TEO
governor ?
Aging the weights could be a solution for that, although this RFC does
not cover the implementation for that.
Having a finer view of the history in addition to weighted randomized
salt seems to show some promise in terms of saving power without
compromising performance.
Benchmarks:
Note: Wt. TEO governor represents the governor after the proposed change
Schbench
========
Benchmarks wakeup latencies
Scale of measurement:
1. 99th percentile latency - usec
2. Power - Watts
Command: $ schbench -c 30000 -s 30000 -m 6 -r 30 -t <Threads>
Varying parameter: -t
Machine: IBM POWER 9
+--------+-------------+-----------------+-----------+-----------------+
| Threads| TEO latency | Wt. TEO latency | TEO power | Wt. TEO power |
+--------+-------------+-----------------+-----------+-----------------+
| 2 | 979 | 949 ( +3.06%) | 38 | 36 ( +5.26%) |
| 4 | 997 | 1042 ( -4.51%) | 51 | 39 ( +23.52%) |
| 8 | 1158 | 1050 ( +9.32%) | 89 | 63 ( +29.21%) |
| 16 | 1138 | 1135 ( +0.26%) | 105 | 117 ( -11.42%) |
+--------+-------------+-----------------+-----------+-----------------+
Sleeping Ebizzy
===============
Program to generate workloads resembling web server workloads.
The benchmark is customized to allow for a sleep interval -i
Scale of measurement:
1. Number of records/s
2. systime (s)
Parameters:
1. -m => Always use mmap instead of malloc
2. -M => Never use mmap
3. -S <seconds> => Number of seconds to run
4. -i <interval> => Sleep interval
Machine: IBM POWER 9
+-------------------+-------------+-------------------+-----------+---------------+
| Parameters | TEO records | Wt. TEO records | TEO power | Wt. TEO power |
+-------------------+-------------+-------------------+-----------+---------------+
| -S 60 -i 10000 | 1115000 | 1198081 ( +7.45%) | 149 | 150 ( -0.66%) |
| -m -S 60 -i 10000 | 15879 | 15513 ( -2.30%) | 23 | 22 ( +4.34%) |
| -M -S 60 -i 10000 | 72887 | 77546 ( +6.39%) | 104 | 103 ( +0.96%) |
+-------------------+-------------+-------------------+-----------+---------------+
Hackbench
=========
Creates a specified number of pairs of schedulable entities
which communicate via either sockets or pipes and time how long it
takes for each pair to send data back and forth.
Scale of measurement:
1. Time (s)
2. Power (watts)
Command: Sockets: $ hackbench -l <Messages>
Pipes : $ hackbench --pipe -l <Messages>
Varying parameter: -l
Machine: IBM POWER 9
+----------+------------+-------------------+----------+-------------------+
| Messages | TEO socket | Wt. TEO socket | TEO pipe | Wt. TEO pipe |
+----------+------------+-------------------+----------+-------------------+
| 100 | 0.042 | 0.043 ( -2.32%) | 0.031 | 0.032 ( +3.12%) |
| 1000 | 0.258 | 0.272 ( +5.14%) | 0.301 | 0.312 ( -3.65%) |
| 10000 | 2.397 | 2.441 ( +1.80%) | 5.642 | 5.092 ( +9.74%) |
| 100000 | 23.691 | 23.730 ( -0.16%) | 57.762 | 57.857 ( -0.16%) |
| 1000000 | 234.103 | 233.841 ( +0.11%) | 559.807 | 592.304 ( -5.80%) |
+----------+------------+-------------------+----------+-------------------+
Power :Socket: Consistent between 135-140 watts for both TEO and Wt. TEO
Pipe: Consistent between 125-130 watts for both TEO and Wt. TEO
Could you also provide power measurements for the duration when the
system is completely idle for each of the variants of TEO governor ?
Is it the case that the benefits that we are seeing above are only due
to Wt. TEO being more conservative than TEO governor by always
choosing a shallower state ?
Pratik Rajesh Sampat (1):--
Weighted approach to gather and use history in TEO governor
drivers/cpuidle/governors/teo.c | 95 +++++++++++++++++++++++++++++++--
1 file changed, 90 insertions(+), 5 deletions(-)
--
2.17.1
Thanks and Regards
gautham.