Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call before freq update
From: Xuewen Yan
Date: Wed Mar 26 2025 - 07:46:44 EST
Hi Prateek,
On Wed, Mar 26, 2025 at 12:37 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Xuewen,
>
> On 3/26/2025 8:27 AM, Xuewen Yan wrote:
> > Hi Prateek,
> >
> > On Wed, Mar 26, 2025 at 12:54 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> >>
> >> Hello Xuewen,
> >>
> >> On 3/25/2025 7:17 AM, Xuewen Yan wrote:
> >>> When task's uclamp is set, we hope that the CPU frequency
> >>> can increase as quickly as possible when the task is enqueued.
> >>> Because the cpu frequency updating happens during the enqueue_task(),
> >>> so the rq's uclamp needs to be updated before the task is enqueued,
> >>> just like util_est.
>
> I thought the frequency ramp up / ramp down was a problem with
> delayed tasks being requeued.
>
Yes, you are right.
IMHO, perhaps this issue should be fixed separately, as uclamp not
only affects delayed tasks, but should also be placed before
enqueue-task for other tasks.
On the other hand, I previously also sent a message regarding the
frequency issue with delayed tasks in iowait.
https://lore.kernel.org/all/20250226114301.4900-1-xuewen.yan@xxxxxxxxxx/
...
> >> On a larger note ...
> >>
> >> An enqueue of a delayed task will call requeue_delayed_entity() which
> >> will only enqueue p->se on its cfs_rq and do an update_load_avg() for
> >> that cfs_rq alone.
> >>
> >> With cgroups enabled, this cfs_rq might not be the root cfs_rq and
> >> cfs_rq_util_change() will not call cpufreq_update_util() leaving the
> >> CPU running at the older frequency despite the updated uclamp
> >> constraints.
> >>
> >> If think cfs_rq_util_change() should be called for the root cfs_rq
> >> when a task is delayed or when it is re-enqueued to re-evaluate
> >> the uclamp constraints.
> >
> > I think you're referring to a different issue with the delayed-task's
> > util_ets/uclamp.
> > This issue is unrelated to util-est and uclamp, because even without
> > these two features, the problem you're mentioning still exists.
> > Specifically, if the delayed-task is not the root CFS task, the CPU
> > frequency might not be updated in time when the delayed-task is
> > enqueued.
> > Maybe we could add the update_load_avg() in clear_delayed to solve the issue?
>
> I thought something like:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0c4cd26ee07..007b0bb91529 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5473,6 +5473,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> if (sched_feat(DELAY_DEQUEUE) && delay &&
> !entity_eligible(cfs_rq, se)) {
> update_load_avg(cfs_rq, se, 0);
> + /* Reevaluate frequency since uclamp may have changed */
> + if (cfs_rq != rq->cfs)
> + cfs_rq_util_change(rq->cfs, 0);
> set_delayed(se);
> return false;
> }
> @@ -6916,6 +6919,9 @@ requeue_delayed_entity(struct sched_entity *se)
> }
>
> update_load_avg(cfs_rq, se, 0);
> + /* Reevaluate frequency since uclamp may have changed */
> + if (cfs_rq != rq->cfs)
> + cfs_rq_util_change(rq->cfs, 0);
> clear_delayed(se);
> }
>
> ---
>
> to ensure that schedutil knows about any changes in the uclamp
> constraints at the first dequeue, at reenqueue.
Because of the decay of update_load_avg(), for a normal task with
uclamp, it doesn't necessarily trigger frequency update when enqueued.
If we want to enforce frequency scaling for requeued delayed-tasks,
would it be possible to extend this change to trigger frequency update
for all enqueued tasks?
>
> >
> > -->8--
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index a0c4cd26ee07..c75d50dab86b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5435,6 +5435,7 @@ static void clear_delayed(struct sched_entity *se)
> > for_each_sched_entity(se) {
> > struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >
> > + update_load_avg(cfs_rq, se, UPDATE_TG);
>
> For finish_delayed_dequeue_entity() calling into clear_delayed(),
> UPDATE_TG would be done already in dequeue_entity().
>
> For requeue, I believe the motivation to skip UPDATE_TG was for
> the entity to compete with its original weight to be picked off
> later.
Okay.
>
---
BR
xuewen