Re: [PATCH] sched/uclamp: Let each sched_class handle uclamp

From: Hongyan Xia
Date: Tue Mar 11 2025 - 10:04:40 EST


On 10/03/2025 12:24, Dietmar Eggemann wrote:
On 10/03/2025 12:56, Hongyan Xia wrote:
On 10/03/2025 11:22, Dietmar Eggemann wrote:
On 10/03/2025 12:03, Xuewen Yan wrote:
Hi Dietmar,

On Mon, Mar 10, 2025 at 6:53 PM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:

On 10/03/2025 03:41, Xuewen Yan wrote:
On Sat, Mar 8, 2025 at 2:32 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:

On 06/03/2025 13:01, Xuewen Yan wrote:
On Thu, Mar 6, 2025 at 2:24 AM Dietmar Eggemann
<dietmar.eggemann@xxxxxxx> wrote:

On 27/02/2025 14:54, Hongyan Xia wrote:

[...]

I submitted a patch similar to yours before:

https://lore.kernel.org/all/CAB8ipk_AvaOWp9QhmnFDdbFSWcKLhCH151=no6kRO2z+pSJfyQ@xxxxxxxxxxxxxx/

And Hongyan fears that as more complexity goes into each sched_class
like delayed dequeue,
so it's better to just let the sched_class handle how uclamp is
enqueued and dequeued within itself rather than leaking into core.c.

Ah, OK. Your patch didn't have 'sched' in the subject so I didn't see it
immediately.

I would prefer that uclamp stays in core.c. ENQUEUE_DELAYED among all
the other flags is already used there (ttwu_runnable()).

task_struct contains  sched_{,rt_,dl_}entity}. We just have to be
careful when switching policies.

I lean towards letting each class handle uclamp. We've seen the trouble
with delayed dequeue. Just like the if condition we have for util_est,
if uclamp is in each class then we can re-use the condition easily,
otherwise we need to carefully synchronize the enqueue/dequeue between
core.c and the sub class.

Also I think so far we are assuming delayed dequeue is the only trouble
maker. If RT and sched_ext have their own corner cases (I think maybe
sched_ext is likely because it may eventually want the ext scheduler to
be able to decide on uclamp itself) then the uclamp inc/dec in core.c
need to cater for that as well. Once a task is in a class, the variables
in another class may be in an undefined state, so checking corner cases
for all the sub-classes in a centralized place like core.c may not even
be easy to get right.

I do understand your concern with sched_ext but I still prefer the less
invasive change to get uclamp & util_est aligned for fair.c aligned.

AFAICS, related to policy changes, we have a
SCHED_WARN_ON(p->se.sched_delayed) in switched_to_fair() so far.

I'm okay with staying in core.c, but care is needed to make sure uclamp inc and dec are balanced. We have been bitten by the unbalanced uclamp and util_est during delayed dequeue fixes and we finally made sure util_est is balanced (at least so far it seems). Reusing the same logic and moving uclamp into fair.c makes sure we don't get bitten again and is one reason why I prefer moving into each class.