On Thu, 25 Oct 2018 at 12:36, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
I have a couple of questions related to the tests you ran.
On a hikey (octo ARM platform).
Performance cpufreq governor and only shallowest c-state to remove variance
generated by those power features so we only track the impact of pelt algo.
So you disabled c-state 'cpu-sleep' and 'cluster-sleep'?
yes
I get 'hisi_thermal f7030700.tsensor: THERMAL ALARM: 66385 > 65000' on
my hikey620. Did you change the thermal configuration? Not sure if there
are any actions attached to this warning though.
I have a fan to ensure that no thermal mitigation will bias the measurement.
each test runs 16 times
./perf bench sched pipe
(higher is better)
kernel tip/sched/core + patch
ops/seconds ops/seconds diff
cgroup
root 59648(+/- 0.13%) 59785(+/- 0.24%) +0.23%
level1 55570(+/- 0.21%) 56003(+/- 0.24%) +0.78%
level2 52100(+/- 0.20%) 52788(+/- 0.22%) +1.32%
hackbench -l 1000
Shouldn't this be '-l 100'?
I have re checked and it's -l 1000
(lower is better)
kernel tip/sched/core + patch
duration(sec) duration(sec) diff
cgroup
root 4.472(+/- 1.86%) 4.346(+/- 2.74%) -2.80%
level1 5.039(+/- 11.05%) 4.662(+/- 7.57%) -7.47%
level2 5.195(+/- 10.66%) 4.877(+/- 8.90%) -6.12%
The responsivness of PELT is improved when CPU is not running at max
capacity with this new algorithm. I have put below some examples of
duration to reach some typical load values according to the capacity of the
CPU with current implementation and with this patch.
Util (%) max capacity half capacity(mainline) half capacity(w/ patch)
972 (95%) 138ms not reachable 276ms
486 (47.5%) 30ms 138ms 60ms
256 (25%) 13ms 32ms 26ms
Could you describe these testcases in more detail?
You don't need to run test case. These numbers are computed based on
geometric series and half period value
What's the initial utilization value of t1? I assume t1 starts with
utilization=512 (post_init_entity_util_avg()).
On my hikey (octo ARM platform) with schedutil governor, the time to reach
max OPP when starting from a null utilization, decreases from 223ms with
current scale invariance down to 121ms with the new algorithm. For this
test, I have enable arch_scale_freq for arm64.
Isn't the arch-specific arch_scale_freq_capacity() enabled by default on
arm64 with cpufreq support?
Yes. that's a remain of previous version when arch_scale_freq was not yet merged