Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call before freq update

From: K Prateek Nayak
Date: Tue Mar 25 2025 - 12:58:07 EST


Hello Xuewen,

On 3/25/2025 7:17 AM, Xuewen Yan wrote:
When task's uclamp is set, we hope that the CPU frequency
can increase as quickly as possible when the task is enqueued.
Because the cpu frequency updating happens during the enqueue_task(),
so the rq's uclamp needs to be updated before the task is enqueued,
just like util_est.
So, aline the uclamp and util_est and call before freq update.

For sched-delayed tasks, the rq uclamp/util_est should only be updated
when they are enqueued upon being awakened.
So simply the logic of util_est's enqueue/dequeue check.

Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
---
v2:
- simply the util-est's en/dequeue check;
---
Previous discussion:
https://lore.kernel.org/all/CAB8ipk8pEvOtCm-d0o1rsekwxPWUHk9iBGtt9TLTWW-iWTQKiA@xxxxxxxxxxxxxx/
https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@xxxxxxx/T/#u
https://lore.kernel.org/all/CAB8ipk9LpbiUDnbcV6+59+Sa=Ai7tFzO===mpLD3obNdV4=J-A@xxxxxxxxxxxxxx/T/#u
https://lore.kernel.org/all/aa8baf67-a8ec-4ad8-a6a8-afdcd7036771@xxxxxxx/
---
kernel/sched/core.c | 17 ++++++++++-------
kernel/sched/fair.c | 4 ++--
2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 042351c7afce..72fbe2031e54 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1747,7 +1747,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
}
}
-static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
+static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags)
{
enum uclamp_id clamp_id;
@@ -1763,7 +1763,8 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
if (unlikely(!p->sched_class->uclamp_enabled))
return;
- if (p->se.sched_delayed)
+ /* Only inc the delayed task which being woken up. */
+ if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
return;
for_each_clamp_id(clamp_id)
@@ -2031,7 +2032,7 @@ static void __init init_uclamp(void)
}
#else /* !CONFIG_UCLAMP_TASK */
-static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
+static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
static inline void uclamp_fork(struct task_struct *p) { }
static inline void uclamp_post_fork(struct task_struct *p) { }
@@ -2067,12 +2068,14 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
if (!(flags & ENQUEUE_NOCLOCK))
update_rq_clock(rq);
- p->sched_class->enqueue_task(rq, p, flags);
/*
- * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
- * ->sched_delayed.
+ * Can be before ->enqueue_task() because uclamp considers the
+ * ENQUEUE_DELAYED task before its ->sched_delayed gets cleared
+ * in ->enqueue_task().
*/
- uclamp_rq_inc(rq, p);
+ uclamp_rq_inc(rq, p, flags);
+
+ p->sched_class->enqueue_task(rq, p, flags);
psi_enqueue(p, flags);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c798d2795243..c92fee07fb7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6930,7 +6930,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* Let's add the task's estimated utilization to the cfs_rq's
* estimated utilization, before we update schedutil.
*/
- if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & ENQUEUE_RESTORE))))
+ if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
util_est_enqueue(&rq->cfs, p);

Wouldn't this do a util_est_{dequeue,enqueue}() for a save restore
operation too of a non-delayed task? Is that desired?

On a larger note ...

An enqueue of a delayed task will call requeue_delayed_entity() which
will only enqueue p->se on its cfs_rq and do an update_load_avg() for
that cfs_rq alone.

With cgroups enabled, this cfs_rq might not be the root cfs_rq and
cfs_rq_util_change() will not call cpufreq_update_util() leaving the
CPU running at the older frequency despite the updated uclamp
constraints.

If think cfs_rq_util_change() should be called for the root cfs_rq
when a task is delayed or when it is re-enqueued to re-evaluate
the uclamp constraints.

Please let me know if I missed something.

if (flags & ENQUEUE_DELAYED) {
@@ -7168,7 +7168,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
*/
static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
- if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE_SAVE))))
+ if (!p->se.sched_delayed)
util_est_dequeue(&rq->cfs, p);
util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);

--
Thanks and Regards,
Prateek