On Wed, Nov 30, 2022 at 4:33 PM Kajetan Puchalski
<kajetan.puchalski@xxxxxxx> wrote:
Modern interactive systems, such as recent Android phones, tend to have power
efficient shallow idle states. Selecting deeper idle states on a device while a
latency-sensitive workload is running can adversely impact performance due to
increased latency. Additionally, if the CPU wakes up from a deeper sleep before
its target residency as is often the case, it results in a waste of energy on
top of that.
At the moment, all the available idle governors operate mainly based on their
own past correctness metrics along with timer events without taking into account
any scheduling information.
I still don't quite agree with the above statement.
It would be accurate enough to state the fact that currently cpuidle
governors don't take scheduling information into account.
Especially on interactive systems, this results in
them frequently selecting a deeper idle state and then waking up before its
target residency is hit, thus leading to increased wakeup latency and lower
performance with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of all
wakeups.
I don't think that you can convincingly establish a cause-and-effect
relationship between not taking scheduling information into account
and overestimating the idle duration.
It would be just fine to say something like "They also tend to
overestimate the idle duration quite often, which causes them to
select excessively deep idle states, which leads to ...".
At the same time, on some platforms C0 can be power efficient enough to warrant
wanting to prefer it over C1.
If you say C0 or C1, a casual reader may think about x86 which
probably is not your intention.
I would say "idle state 0" and "idle state 1" instead. I would also
say that this is on systems where idle state 0 is not a polling state.
This is because the power usage of the two states
can be so close that sufficient amounts of too deep C1 sleeps can completely
offset the C1 power saving to the point where it would've been more power
efficient to just use C0 instead.
Sleeps that happened in C0 while they could have used C1 ('too shallow') only
save less power than they otherwise could have. Too deep sleeps, on the other
hand, harm performance and nullify the potential power saving from using C1 in
the first place. While taking this into account, it is clear that on balance it
is preferable for an idle governor to have more too shallow sleeps instead of
more too deep sleeps on those kinds of platforms.
I don't think that the above paragraphs, while generally true, are
relevant for what the patch really does.
They would have been relevant if the patch had improved the
energy-efficiency, but it doesn't. It sacrifices energy for
performance by reducing the CPU wakeup latency.
This patch specifically tunes TEO to minimise too deep sleeps and minimise
latency to achieve better performance.
I'm not sure if you can demonstrate that the number of "too deep
sleeps" is really reduced in all cases, but the reduction of latency
is readily demonstrable, so I would focus on that part.
To this end, before selecting the next
idle state it uses the avg_util signal of a CPU's runqueue in order to determine
to what extent the CPU is being utilized. This util value is then compared to a
threshold defined as a percentage of the cpu's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
idle state selected by TEO metrics will be reduced by 1, thus selecting a
shallower state. If the util is below the threshold, the governor defaults to
the TEO metrics mechanism to try to select the deepest available idle state
based on the closest timer event and its own correctness.
The main goal of this is to reduce latency and increase performance for some
workloads. Under some workloads it will result in an increase in power usage
(Geekbench 5) while for other workloads it will also result in a decrease in
power usage compared to TEO (PCMark Web, Jankbench, Speedometer).
It can provide drastically decreased latency and performance benefits in certain
types of workloads that are sensitive to latency.
And I would put some numbers from your cover letter in here.
+ *
+ * When the CPU is utilized while going into idle, more likely than not it will
+ * be woken up to do more work soon and so a shallower idle state should be
+ * selected to minimise latency and maximise performance. When the CPU is not
+ * being utilized, the usual metrics-based approach to selecting the deepest
+ * available idle state should be preferred to take advantage of the power
+ * saving.
I would say "energy saving" instead of "power saving", as the former
is technically more accurate.