Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call before freq update
From: Xuewen Yan
Date: Tue Mar 25 2025 - 22:57:48 EST
Hi Prateek,
On Wed, Mar 26, 2025 at 12:54 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Xuewen,
>
> On 3/25/2025 7:17 AM, Xuewen Yan wrote:
> > When task's uclamp is set, we hope that the CPU frequency
> > can increase as quickly as possible when the task is enqueued.
> > Because the cpu frequency updating happens during the enqueue_task(),
> > so the rq's uclamp needs to be updated before the task is enqueued,
> > just like util_est.
> > So, aline the uclamp and util_est and call before freq update.
> >
> > For sched-delayed tasks, the rq uclamp/util_est should only be updated
> > when they are enqueued upon being awakened.
> > So simply the logic of util_est's enqueue/dequeue check.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@xxxxxxxxxx>
> > ---
> > v2:
> > - simply the util-est's en/dequeue check;
> > ---
> > Previous discussion:
> > https://lore.kernel.org/all/CAB8ipk8pEvOtCm-d0o1rsekwxPWUHk9iBGtt9TLTWW-iWTQKiA@xxxxxxxxxxxxxx/
> > https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@xxxxxxx/T/#u
> > https://lore.kernel.org/all/CAB8ipk9LpbiUDnbcV6+59+Sa=Ai7tFzO===mpLD3obNdV4=J-A@xxxxxxxxxxxxxx/T/#u
> > https://lore.kernel.org/all/aa8baf67-a8ec-4ad8-a6a8-afdcd7036771@xxxxxxx/
> > ---
> > kernel/sched/core.c | 17 ++++++++++-------
> > kernel/sched/fair.c | 4 ++--
> > 2 files changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 042351c7afce..72fbe2031e54 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1747,7 +1747,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> > }
> > }
> >
> > -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> > +static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags)
> > {
> > enum uclamp_id clamp_id;
> >
> > @@ -1763,7 +1763,8 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> > if (unlikely(!p->sched_class->uclamp_enabled))
> > return;
> >
> > - if (p->se.sched_delayed)
> > + /* Only inc the delayed task which being woken up. */
> > + if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
> > return;
> >
> > for_each_clamp_id(clamp_id)
> > @@ -2031,7 +2032,7 @@ static void __init init_uclamp(void)
> > }
> >
> > #else /* !CONFIG_UCLAMP_TASK */
> > -static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { }
> > +static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags) { }
> > static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { }
> > static inline void uclamp_fork(struct task_struct *p) { }
> > static inline void uclamp_post_fork(struct task_struct *p) { }
> > @@ -2067,12 +2068,14 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> > if (!(flags & ENQUEUE_NOCLOCK))
> > update_rq_clock(rq);
> >
> > - p->sched_class->enqueue_task(rq, p, flags);
> > /*
> > - * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear
> > - * ->sched_delayed.
> > + * Can be before ->enqueue_task() because uclamp considers the
> > + * ENQUEUE_DELAYED task before its ->sched_delayed gets cleared
> > + * in ->enqueue_task().
> > */
> > - uclamp_rq_inc(rq, p);
> > + uclamp_rq_inc(rq, p, flags);
> > +
> > + p->sched_class->enqueue_task(rq, p, flags);
> >
> > psi_enqueue(p, flags);
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index c798d2795243..c92fee07fb7b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6930,7 +6930,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > * Let's add the task's estimated utilization to the cfs_rq's
> > * estimated utilization, before we update schedutil.
> > */
> > - if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & ENQUEUE_RESTORE))))
> > + if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
> > util_est_enqueue(&rq->cfs, p);
>
> Wouldn't this do a util_est_{dequeue,enqueue}() for a save restore
> operation too of a non-delayed task? Is that desired?
For delayed-task, its util_est should dequeue/enqueue only for its
sleeping and waking up,
For the save restore operation, there is no need to enqueue it,
because it is not woken up.
So the condition of enqueue actually is:
if (!p->se.sched_delayed || (p->se.sched_delayed && (flags & ENQUEUE_DELAYED)))
And, this is equal to :
if (!p->se.sched_delayed || (flags & ENQUEUE_DELAYED))
More details here:
https://lore.kernel.org/all/84441660bef0a5e67fd09dc3787178d0276dad31.1740664400.git.hongyan.xia2@xxxxxxx/T/#ma2505e90489316eb354390b42dee9d053f6fd1e9
>
> On a larger note ...
>
> An enqueue of a delayed task will call requeue_delayed_entity() which
> will only enqueue p->se on its cfs_rq and do an update_load_avg() for
> that cfs_rq alone.
>
> With cgroups enabled, this cfs_rq might not be the root cfs_rq and
> cfs_rq_util_change() will not call cpufreq_update_util() leaving the
> CPU running at the older frequency despite the updated uclamp
> constraints.
>
> If think cfs_rq_util_change() should be called for the root cfs_rq
> when a task is delayed or when it is re-enqueued to re-evaluate
> the uclamp constraints.
I think you're referring to a different issue with the delayed-task's
util_ets/uclamp.
This issue is unrelated to util-est and uclamp, because even without
these two features, the problem you're mentioning still exists.
Specifically, if the delayed-task is not the root CFS task, the CPU
frequency might not be updated in time when the delayed-task is
enqueued.
Maybe we could add the update_load_avg() in clear_delayed to solve the issue?
-->8--
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a0c4cd26ee07..c75d50dab86b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5435,6 +5435,7 @@ static void clear_delayed(struct sched_entity *se)
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ update_load_avg(cfs_rq, se, UPDATE_TG);
cfs_rq->h_nr_runnable++;
if (cfs_rq_throttled(cfs_rq))
break;
---
BR
xuewen