Re: [RFC PATCH 5/5] sched/fair: Add push task callback for EAS

From: Pierre Gondois
Date: Wed Sep 11 2024 - 10:03:25 EST


Hello Vincent,

On 8/30/24 15:03, Vincent Guittot wrote:
EAS is based on wakeup events to efficiently place tasks on the system, but
there are cases where a task will not have wakeup events anymore or at a
far too low pace. For such situation, we can take advantage of the task
being put back in the enqueued list to check if it should be migrated on
another CPU. When the task is the only one running on the CPU, the tick
will check it the task is stuck on this CPU and should migrate on another
one.

Wake up events remain the main way to migrate tasks but we now detect
situation where a task is stuck on a CPU by checking that its utilization
is larger than the max available compute capacity (max cpu capacity or
uclamp max setting)

Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 211 +++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 2 +
2 files changed, 213 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e46af2416159..41fb18ac118b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c

[snip]

+
+static inline void check_misfit_cpu(struct task_struct *p, struct rq *rq)
+{
+ int new_cpu, cpu = cpu_of(rq);
+
+ if (!sched_energy_enabled())
+ return;
+
+ if (WARN_ON(!p))
+ return;
+
+ if (WARN_ON(p != rq->curr))
+ return;
+
+ if (is_migration_disabled(p))
+ return;
+
+ if ((rq->nr_running > 1) || (p->nr_cpus_allowed == 1))
+ return;

I tried the code on a Pixel6 with the following setup:
- without the above (rq->nr_running > 1) condition
- without the push task mechanism
i.e. tasks without regular wakeups only have the opportunity to
run feec() via the sched_tick. It seemed sufficient to avoid
the problematic you mentioned:
- having unbalanced UCLAMP_MAX tasks in a pd, e.g. 1 UCLAMP_MAX task
per little CPU, except one little CPU with N UCLAMP_MAX tasks
- downgrading UCLAMP_MAX tasks that could run on smaller CPUs
but have no wakeups and thus don't run feec()

Thus I was wondering it it would not be better to integrate the
EAS to the load balancer instead (not my idea, but don't remember
who suggested that).
Or otherwise if just running feec() through the sched_tick path
would not be sufficient (i.e. this patch minus the push mechanism).

+
+ if (!task_misfit_cpu(p, cpu))
+ return;
+
+ new_cpu = find_energy_efficient_cpu(p, cpu);
+
+ if (new_cpu == cpu)
+ return;
+
+ /*
+ * ->active_balance synchronizes accesses to
+ * ->active_balance_work. Once set, it's cleared
+ * only after active load balance is finished.
+ */
+ if (!rq->active_balance) {
+ rq->active_balance = 1;
+ rq->push_cpu = new_cpu;
+ } else
+ return;
+
+ raw_spin_rq_unlock(rq);
+ stop_one_cpu_nowait(cpu,
+ active_load_balance_cpu_stop, rq,
+ &rq->active_balance_work);
+ raw_spin_rq_lock(rq);

I didn't hit any error, but isn't it eligible to the following ?
commit f0498d2a54e7 ("sched: Fix stop_one_cpu_nowait() vs hotplug")


+}
+
+static inline int has_pushable_tasks(struct rq *rq)
+{
+ return !plist_head_empty(&rq->cfs.pushable_tasks);
+}
+
+static struct task_struct *pick_next_pushable_fair_task(struct rq *rq)
+{
+ struct task_struct *p;
+
+ if (!has_pushable_tasks(rq))
+ return NULL;
+
+ p = plist_first_entry(&rq->cfs.pushable_tasks,
+ struct task_struct, pushable_tasks);
+
+ WARN_ON_ONCE(rq->cpu != task_cpu(p));
+ WARN_ON_ONCE(task_current(rq, p));
+ WARN_ON_ONCE(p->nr_cpus_allowed <= 1);
+
+ WARN_ON_ONCE(!task_on_rq_queued(p));
+
+ /*
+ * Remove task from the pushable list as we try only once after
+ * task has been put back in enqueued list.
+ */
+ plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
+
+ return p;
+}
+
+/*
+ * See if the non running fair tasks on this rq
+ * can be sent to some other CPU that fits better with
+ * their profile.
+ */
+static int push_fair_task(struct rq *rq)
+{
+ struct task_struct *next_task;
+ struct rq *new_rq;
+ int prev_cpu, new_cpu;
+ int ret = 0;
+
+ next_task = pick_next_pushable_fair_task(rq);
+ if (!next_task)
+ return 0;
+
+ if (is_migration_disabled(next_task))
+ return 0;
+
+ if (WARN_ON(next_task == rq->curr))
+ return 0;
+
+ /* We might release rq lock */
+ get_task_struct(next_task);
+
+ prev_cpu = rq->cpu;
+
+ new_cpu = find_energy_efficient_cpu(next_task, prev_cpu);
+
+ if (new_cpu == prev_cpu)
+ goto out;
+
+ new_rq = cpu_rq(new_cpu);
+
+ if (double_lock_balance(rq, new_rq)) {


I think it might be necessary to check the following:
if (task_cpu(next_task) != rq->cpu) {
double_unlock_balance(rq, new_rq);
goto out;
}

Indeed I've been hitting the following warnings:
- uclamp_rq_dec_id():SCHED_WARN_ON(!bucket->tasks)
- set_task_cpu()::WARN_ON_ONCE(state == TASK_RUNNING &&
p->sched_class == &fair_sched_class &&
(p->on_rq && !task_on_rq_migrating(p)))
- update_entity_lag()::SCHED_WARN_ON(!se->on_rq)

and it seemed to be caused by the task not being on the initial rq anymore.

+
+ deactivate_task(rq, next_task, 0);
+ set_task_cpu(next_task, new_cpu);
+ activate_task(new_rq, next_task, 0);
+ ret = 1;
+
+ resched_curr(new_rq);
+
+ double_unlock_balance(rq, new_rq);
+ }
+
+out:
+ put_task_struct(next_task);
+
+ return ret;
+}
+

Regards,
Pierre