Hi Lukasz,
thanks for your comments, one question below.
On 09/03/2021 11:01, Lukasz Luba wrote:
[ ... ]
+static u64 scale_pd_power_uw(struct cpumask *cpus, u64 power)
renamed 'cpus' into 'pd_mask', see below
+{
+ unsigned long max, util;
+ int cpu, load = 0;
IMHO 'int load' looks odd when used with 'util' and 'max'.
I would put in the line above to have them all the same type and
renamed to 'sum_util'.
+
+ for_each_cpu(cpu, cpus) {
I would avoid the temporary CPU mask in the get_pd_power_uw()
with this modified loop:
for_each_cpu_and(cpu, pd_mask, cpu_online_mask) {
+ max = arch_scale_cpu_capacity(cpu);
+ util = sched_cpu_util(cpu, max);
+ load += ((util * 100) / max);
Below you can find 3 optimizations. Since we are not in the hot
path here, it's up to if you would like to use all/some of them
or just ignore.
1st optimization.
If we use 'load += (util << 10) / max' in the loop, then
we could avoid div by 100 and use a right shift:
(power * load) >> 10
2nd optimization.
Since we use EM CPU mask, which span all CPUs with the same
arch_scale_cpu_capacity(), you can avoid N divs inside the loop
and do it once, below the loop.
3rd optimization.
If we just simply add all 'util' into 'sum_util' (no mul or div in
the loop), then we might just have simple macro
#define CALC_POWER_USAGE(power, sum_util, max) \
(((power * (sum_util << 10)) / max) >> 10)
I don't understand the 'max' division, I was expecting here something
like: ((sum_util << 10) / sum_max) >> 10)
no ?