Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS

From: Pierre Gondois
Date: Fri Sep 13 2024 - 12:08:42 EST


Hello Vincent,

On 8/30/24 15:03, Vincent Guittot wrote:
EAS is based on wakeup events to efficiently place tasks on the system, but
there are cases where a task will not have wakeup events anymore or at a
far too low pace. For such situation, we can take advantage of the task
being put back in the enqueued list to check if it should be migrated on
another CPU. When the task is the only one running on the CPU, the tick
will check it the task is stuck on this CPU and should migrate on another
one.

Wake up events remain the main way to migrate tasks but we now detect
situation where a task is stuck on a CPU by checking that its utilization
is larger than the max available compute capacity (max cpu capacity or
uclamp max setting)

Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 211 +++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 2 +
2 files changed, 213 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e46af2416159..41fb18ac118b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c

[...]

+
+static inline void check_misfit_cpu(struct task_struct *p, struct rq *rq)
+{
+ int new_cpu, cpu = cpu_of(rq);
+
+ if (!sched_energy_enabled())
+ return;
+
+ if (WARN_ON(!p))
+ return;
+
+ if (WARN_ON(p != rq->curr))
+ return;
+
+ if (is_migration_disabled(p))
+ return;
+
+ if ((rq->nr_running > 1) || (p->nr_cpus_allowed == 1))

If the goal is to detect tasks that should be migrated to bigger CPUs,
couldn't the check be changed from:
- (p->nr_cpus_allowed == 1)
to
- (p->max_allowed_capacity == arch_scale_cpu_capacity(cpu))
to avoid the case where a task is bound to the little cluster for instance ?

Similar question for update_misfit_status(), doesn't:
- (arch_scale_cpu_capacity(cpu) == p->max_allowed_capacity)
include this case:
- (p->nr_cpus_allowed == 1)


+ return;
+
+ if (!task_misfit_cpu(p, cpu))
+ return;

task_misfit_cpu() intends to check whether the task will have an opportunity
to run feec() though wakeups/push-pull.

Shouldn't we check whether the task fits the CPU with the 20% margin
with task_fits_cpu() aswell ? This would allow to migrate the task
faster than the load_balancer.


+
+ new_cpu = find_energy_efficient_cpu(p, cpu);
+
+ if (new_cpu == cpu)
+ return;
+
+ /*
+ * ->active_balance synchronizes accesses to
+ * ->active_balance_work. Once set, it's cleared
+ * only after active load balance is finished.
+ */
+ if (!rq->active_balance) {
+ rq->active_balance = 1;
+ rq->push_cpu = new_cpu;
+ } else
+ return;
+
+ raw_spin_rq_unlock(rq);
+ stop_one_cpu_nowait(cpu,
+ active_load_balance_cpu_stop, rq,
+ &rq->active_balance_work);
+ raw_spin_rq_lock(rq);
+}
+

Regards,
Pierre