Re: [PATCH 5/5] sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()
From: K Prateek Nayak
Date: Tue Jun 02 2026 - 03:04:07 EST
Hello Peter,
On 6/1/2026 7:18 PM, Peter Zijlstra wrote:
> On Thu, May 28, 2026 at 09:48:30AM +0000, K Prateek Nayak wrote:
>
>> @@ -9893,8 +9882,15 @@ static struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
>> /* Might not have done put_prev_entity() */
>> if (cfs_rq->curr && cfs_rq->curr->on_rq)
>> update_curr(cfs_rq);
>> -
>> - throttled |= check_cfs_rq_runtime(cfs_rq);
>> + /*
>> + * For the current hierarchy, update_curr() above would
>> + * have set the throttle indicators if the cfs_rq has
>> + * run out of bandwidth. For others, enqueue / last
>> + * update_curr() for the cfs_rq would have ensured the
>> + * throttle indicators are set if bandwidth was not
>> + * available.
>> + */
>> + throttled |= cfs_rq_throttled(cfs_rq);
>>
>> se = pick_next_entity(rq, cfs_rq, true);
>> if (!se)
>
>> @@ -15074,15 +15070,19 @@ static void __set_next_task_fair(struct rq *rq, struct task_struct *p, bool firs
>> static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
>> {
>> struct sched_entity *se = &p->se;
>> + bool throttled = false;
>>
>> for_each_sched_entity(se) {
>> struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>
>> set_next_entity(cfs_rq, se, first);
>> /* ensure bandwidth has been allocated on our new cfs_rq */
>> - account_cfs_rq_runtime(cfs_rq, 0);
>> + throttled |= account_cfs_rq_runtime(cfs_rq, 0);
>> }
>>
>> + if (throttled)
>> + task_throttle_setup_work(p);
>> +
>> __set_next_task_fair(rq, p, first);
>> }
>
> (noticed while trying to rebase flat on top)
>
> Why do we have both? Isn't just set_next_task_fair(.first=true)
> sufficient?
I misread this bit and only called account_cfs_rq_runtime() for !first in my
v2 [1] but even the following changes on top of my v2 [1] yields similar
results in my testing (slightly better on performance) so feel free to squash
this bit into the last patch:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ce5cf494b934..fa8c0b1a1cf1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9892,15 +9892,6 @@ struct task_struct *pick_task_fair(struct rq *rq, struct rq_flags *rf)
/* Might not have done put_prev_entity() */
if (cfs_rq->curr && cfs_rq->curr->on_rq)
update_curr(cfs_rq);
- /*
- * For the current hierarchy, update_curr() above would
- * have set the throttle indicators if the cfs_rq has
- * run out of bandwidth. For others, enqueue / last
- * update_curr() for the cfs_rq would have ensured the
- * throttle indicators are set if bandwidth was not
- * available.
- */
- throttled |= cfs_rq_throttled(cfs_rq);
se = pick_next_entity(rq, cfs_rq, true);
if (!se)
@@ -15012,12 +15003,8 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
break;
set_next_entity(cfs_rq, se, first);
- /*
- * Ensure bandwidth has been allocated on our new cfs_rq
- * if we've reached here for reasons other than pick.
- */
- if (!first)
- throttled |= account_cfs_rq_runtime(cfs_rq, 0);
+ /* ensure bandwidth has been allocated on our new cfs_rq */
+ throttled |= account_cfs_rq_runtime(cfs_rq, 0);
}
if (throttled)
---
My mind is taking a while to grasp the ->pick_next_task() removal.
[1] https://lore.kernel.org/lkml/20260602050005.11160-1-kprateek.nayak@xxxxxxx/
--
Thanks and Regards,
Prateek