Re: [PATCH] sched/uclamp: Let each sched_class handle uclamp
From: Dietmar Eggemann
Date: Mon Mar 10 2025 - 08:24:59 EST
On 10/03/2025 12:56, Hongyan Xia wrote:
> On 10/03/2025 11:22, Dietmar Eggemann wrote:
>> On 10/03/2025 12:03, Xuewen Yan wrote:
>>> Hi Dietmar,
>>>
>>> On Mon, Mar 10, 2025 at 6:53 PM Dietmar Eggemann
>>> <dietmar.eggemann@xxxxxxx> wrote:
>>>>
>>>> On 10/03/2025 03:41, Xuewen Yan wrote:
>>>>> On Sat, Mar 8, 2025 at 2:32 AM Dietmar Eggemann
>>>>> <dietmar.eggemann@xxxxxxx> wrote:
>>>>>>
>>>>>> On 06/03/2025 13:01, Xuewen Yan wrote:
>>>>>>> On Thu, Mar 6, 2025 at 2:24 AM Dietmar Eggemann
>>>>>>> <dietmar.eggemann@xxxxxxx> wrote:
>>>>>>>>
>>>>>>>> On 27/02/2025 14:54, Hongyan Xia wrote:
[...]
>>> I submitted a patch similar to yours before:
>>>
>>> https://lore.kernel.org/all/CAB8ipk_AvaOWp9QhmnFDdbFSWcKLhCH151=no6kRO2z+pSJfyQ@xxxxxxxxxxxxxx/
>>>
>>> And Hongyan fears that as more complexity goes into each sched_class
>>> like delayed dequeue,
>>> so it's better to just let the sched_class handle how uclamp is
>>> enqueued and dequeued within itself rather than leaking into core.c.
>>
>> Ah, OK. Your patch didn't have 'sched' in the subject so I didn't see it
>> immediately.
>>
>> I would prefer that uclamp stays in core.c. ENQUEUE_DELAYED among all
>> the other flags is already used there (ttwu_runnable()).
>>
>> task_struct contains sched_{,rt_,dl_}entity}. We just have to be
>> careful when switching policies.
>
> I lean towards letting each class handle uclamp. We've seen the trouble
> with delayed dequeue. Just like the if condition we have for util_est,
> if uclamp is in each class then we can re-use the condition easily,
> otherwise we need to carefully synchronize the enqueue/dequeue between
> core.c and the sub class.
>
> Also I think so far we are assuming delayed dequeue is the only trouble
> maker. If RT and sched_ext have their own corner cases (I think maybe
> sched_ext is likely because it may eventually want the ext scheduler to
> be able to decide on uclamp itself) then the uclamp inc/dec in core.c
> need to cater for that as well. Once a task is in a class, the variables
> in another class may be in an undefined state, so checking corner cases
> for all the sub-classes in a centralized place like core.c may not even
> be easy to get right.
I do understand your concern with sched_ext but I still prefer the less
invasive change to get uclamp & util_est aligned for fair.c aligned.
AFAICS, related to policy changes, we have a
SCHED_WARN_ON(p->se.sched_delayed) in switched_to_fair() so far.