On Wed, 11 Sept 2024 at 16:03, Pierre Gondois <pierre.gondois@xxxxxxx> wrote:
Hello Vincent,
On 8/30/24 15:03, Vincent Guittot wrote:
feec() looks for the CPU with highest spare capacity in a PD assuming that
it will be the best CPU from a energy efficiency PoV because it will
require the smallest increase of OPP. Although this is true generally
speaking, this policy also filters some others CPUs which will be as
efficients because of using the same OPP.
In fact, we really care about the cost of the new OPP that will be
selected to handle the waking task. In many cases, several CPUs will end
up selecting the same OPP and as a result using the same energy cost. In
these cases, we can use other metrics to select the best CPU for the same
energy cost.
Rework feec() to look 1st for the lowest cost in a PD and then the most
performant CPU between CPUs.
Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 466 +++++++++++++++++++++++---------------------
1 file changed, 244 insertions(+), 222 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e67d6029b269..2273eecf6086 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
[snip]
-/*
- * compute_energy(): Use the Energy Model to estimate the energy that @pd would
- * consume for a given utilization landscape @eenv. When @dst_cpu < 0, the task
- * contribution is ignored.
- */
-static inline unsigned long
-compute_energy(struct energy_env *eenv, struct perf_domain *pd,
- struct cpumask *pd_cpus, struct task_struct *p, int dst_cpu)
+/*Check if the CPU can handle the waking task */
+static int check_cpu_with_task(struct task_struct *p, int cpu)
{
- unsigned long max_util = eenv_pd_max_util(eenv, pd_cpus, p, dst_cpu);
- unsigned long busy_time = eenv->pd_busy_time;
- unsigned long energy;
+ unsigned long p_util_min = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MIN) : 0;
+ unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024;
+ unsigned long util_min = p_util_min;
+ unsigned long util_max = p_util_max;
+ unsigned long util = cpu_util(cpu, p, cpu, 0);
+ struct rq *rq = cpu_rq(cpu);
- if (dst_cpu >= 0)
- busy_time = min(eenv->pd_cap, busy_time + eenv->task_busy_time);
I think you should mention that the energy computation is not capped anymore.
It used to be:
pd_util: sum of the CPU's util for the pd considered, without task_util
pd_cap: sum of the CPU's capacity for the pd
(a)
busy_time = min(pd_cap, pd_util);
prev_energy = busy_time * OPP[prev_max_util].cost;
busy_time = min(pd_cap, pd_util + task_util);
new_energy = busy_time * OPP[new_max_util].cost;
delta_energy = new_energy - prev_energy;
Note that when computing busy_time, task_util is not capped to one CPU's
max_cap. This means that in empty pd, a task can 'steal' capacity from
CPUs during the energy computation.
Cf. [1]
and it is now:
(b)
delta_energy = task_util * OPP[new_max_util].cost;
delta_energy += pd_util * (OPP[new_max_util].cost - OPP[prev_max_util].cost);
Note that the busy_time (task_util + pd_util) is now not capped by anything.
As discussed in [1], capping utilization with max capacity is a
mistake because we lost the information that this will run longer