Re: [PATCH] sched/uclamp: Let each sched_class handle uclamp

From: Hongyan Xia
Date: Thu Mar 06 2025 - 09:26:43 EST


On 06/03/2025 13:48, Dietmar Eggemann wrote:
On 06/03/2025 11:53, Hongyan Xia wrote:
On 05/03/2025 18:22, Dietmar Eggemann wrote:
On 27/02/2025 14:54, Hongyan Xia wrote:

[...]

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 857808da23d8..7e5a653811ad 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6941,8 +6941,10 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
       * Let's add the task's estimated utilization to the cfs_rq's
       * estimated utilization, before we update schedutil.
       */
-    if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags
& ENQUEUE_RESTORE))))
+    if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags
& ENQUEUE_RESTORE)))) {
+        uclamp_rq_inc(rq, p);
          util_est_enqueue(&rq->cfs, p);
+    }

So you want to have p uclamp-enqueued so that its uclamp_min value
counts for the cpufreq_update_util()/cfs_rq_util_change() calls later in
enqueue_task_fair?

   if (p->in_iowait)
     cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);

   enqueue_entity() -> update_load_avg() -> cfs_rq_util_change() ->
   cpufreq_update_util()

But if you do this before requeue_delayed_entity() (1) you will not
uclamp-enqueue p which got his ->sched_delayed just cleared in (1)?

Sorry I'm not sure I'm following. The only condition of the
uclamp_rq_inc() here should be

    if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags &
ENQUEUE_RESTORE))))

Could you elaborate why it doesn't get enqueued?

Let's say 'p->se.sched_delayed = 1' and we are in

enqueue_task()

enqueue_task_fair()

if (!(p->se.sched_delayed && ...)

uclamp_rq_inc(rq, p);

So p wouldn't be included here.

But then p would be requeued in

requeue_delayed_entity(se)

since you removed the uclamp_rq_inc() from enqueue_task() (after the
p->sched_class->enqueue_task) p will not be considered for uclamp.


I doubt this would be a concern as there are other conditions after checking p->se.sched_delayed. You would only skip the uclamp inc if you are both sched_delayed and meet the second part of the if.

Another reason is that, I think whatever we do should be consistent with what we did for util_est. If util_est also affects cpufreq then I doubt uclamp should be enqueue/dequeued differently, as it would be difficult to argue why sometimes util_est affects cpufreq while uclamp doesn't and why sometimes uclamp does and util_est doesn't.