Re: [PATCH v2] sched/uclamp: Align uclamp and util_est and call before freq update

From: Dietmar Eggemann
Date: Tue Apr 01 2025 - 04:32:03 EST


On 26/03/2025 12:46, Xuewen Yan wrote:
> Hi Prateek,
>
> On Wed, Mar 26, 2025 at 12:37 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>>
>> Hello Xuewen,
>>
>> On 3/26/2025 8:27 AM, Xuewen Yan wrote:
>>> Hi Prateek,
>>>
>>> On Wed, Mar 26, 2025 at 12:54 AM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>>>>
>>>> Hello Xuewen,
>>>>
>>>> On 3/25/2025 7:17 AM, Xuewen Yan wrote:

[...]

>>>> If think cfs_rq_util_change() should be called for the root cfs_rq
>>>> when a task is delayed or when it is re-enqueued to re-evaluate
>>>> the uclamp constraints.
>>>
>>> I think you're referring to a different issue with the delayed-task's
>>> util_ets/uclamp.
>>> This issue is unrelated to util-est and uclamp, because even without
>>> these two features, the problem you're mentioning still exists.
>>> Specifically, if the delayed-task is not the root CFS task, the CPU
>>> frequency might not be updated in time when the delayed-task is
>>> enqueued.
>>> Maybe we could add the update_load_avg() in clear_delayed to solve the issue?
>>
>> I thought something like:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index a0c4cd26ee07..007b0bb91529 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5473,6 +5473,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>> if (sched_feat(DELAY_DEQUEUE) && delay &&
>> !entity_eligible(cfs_rq, se)) {
>> update_load_avg(cfs_rq, se, 0);
>> + /* Reevaluate frequency since uclamp may have changed */
>> + if (cfs_rq != rq->cfs)
>> + cfs_rq_util_change(rq->cfs, 0);
>> set_delayed(se);
>> return false;
>> }
>> @@ -6916,6 +6919,9 @@ requeue_delayed_entity(struct sched_entity *se)
>> }
>>
>> update_load_avg(cfs_rq, se, 0);
>> + /* Reevaluate frequency since uclamp may have changed */
>> + if (cfs_rq != rq->cfs)
>> + cfs_rq_util_change(rq->cfs, 0);
>> clear_delayed(se);
>> }
>>
>> ---
>>
>> to ensure that schedutil knows about any changes in the uclamp
>> constraints at the first dequeue, at reenqueue.
>
> Because of the decay of update_load_avg(), for a normal task with
> uclamp, it doesn't necessarily trigger frequency update when enqueued.
> If we want to enforce frequency scaling for requeued delayed-tasks,
> would it be possible to extend this change to trigger frequency update
> for all enqueued tasks?

But IMHO this is not what we want to achieve here? Instead, we want that
the uclamp values of a just enqueued p_1 with:

'p_1->se.sched_delayed && !(flags & ENQUEUE_DELAYED)'

possibly count in CPU frequency settings of p_2 via:

enqueue_entity(..., &p_2->se, ...) -> update_load_avg() ->
if(decayed)cfs_rq_util_change()

e.g. for shared frequency domain:

-> sugov_update_shared() -> sugov_next_freq_shared() -> sugov_get_util()
-> effective_cpu_util(..., &min, &max)

uclamp is about applying the max value of all enqueued tasks.

[...]