Re: [PATCH] sched/uclamp: Let each sched_class handle uclamp

From: Hongyan Xia
Date: Thu Mar 06 2025 - 09:41:09 EST


On 06/03/2025 14:26, Hongyan Xia wrote:
On 06/03/2025 13:48, Dietmar Eggemann wrote:
On 06/03/2025 11:53, Hongyan Xia wrote:
On 05/03/2025 18:22, Dietmar Eggemann wrote:
On 27/02/2025 14:54, Hongyan Xia wrote:

[...]

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 857808da23d8..7e5a653811ad 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6941,8 +6941,10 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
        * Let's add the task's estimated utilization to the cfs_rq's
        * estimated utilization, before we update schedutil.
        */
-    if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags
& ENQUEUE_RESTORE))))
+    if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags
& ENQUEUE_RESTORE)))) {
+        uclamp_rq_inc(rq, p);
           util_est_enqueue(&rq->cfs, p);
+    }

So you want to have p uclamp-enqueued so that its uclamp_min value
counts for the cpufreq_update_util()/cfs_rq_util_change() calls later in
enqueue_task_fair?

    if (p->in_iowait)
      cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);

    enqueue_entity() -> update_load_avg() -> cfs_rq_util_change() ->
    cpufreq_update_util()

But if you do this before requeue_delayed_entity() (1) you will not
uclamp-enqueue p which got his ->sched_delayed just cleared in (1)?

Sorry I'm not sure I'm following. The only condition of the
uclamp_rq_inc() here should be

     if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags &
ENQUEUE_RESTORE))))

Could you elaborate why it doesn't get enqueued?

Let's say 'p->se.sched_delayed = 1' and we are in

enqueue_task()

   enqueue_task_fair()

     if (!(p->se.sched_delayed && ...)

       uclamp_rq_inc(rq, p);

So p wouldn't be included here.

But then p would be requeued in

       requeue_delayed_entity(se)

since you removed the uclamp_rq_inc() from enqueue_task() (after the
p->sched_class->enqueue_task) p will not be considered for uclamp.


I doubt this would be a concern as there are other conditions after checking p->se.sched_delayed. You would only skip the uclamp inc if you are both sched_delayed and meet the second part of the if.

Another reason is that, I think whatever we do should be consistent with what we did for util_est. If util_est also affects cpufreq then I doubt uclamp should be enqueue/dequeued differently, as it would be difficult to argue why sometimes util_est affects cpufreq while uclamp doesn't and why sometimes uclamp does and util_est doesn't.


Sorry what I just said might be a bit misleading. What I wanted to say was that util_est and uclamp should be in sync, but exactly when to enqueue or dequeue these two can be changed.

Do we want to first agree on one thing, which is, should we keep the util_est and uclamp on the rq while a task is being delayed?