Re: [PATCH] sched/fair: Prevent cfs_rq from being unthrottled with zero runtime_remaining

From: Hao Jia

Date: Tue Oct 14 2025 - 07:01:35 EST



Hello Aaron,

Thank you for your reply.

On 2025/10/14 17:11, Aaron Lu wrote:
Hi Hao,

On Tue, Oct 14, 2025 at 03:43:10PM +0800, Hao Jia wrote:

Hello Aaron,

On 2025/9/29 15:46, Aaron Lu wrote:
When a cfs_rq is to be throttled, its limbo list should be empty and
that's why there is a warn in tg_throttle_down() for non empty
cfs_rq->throttled_limbo_list.

When running a test with the following hierarchy:

root
/ \
A* ...
/ | \ ...
B
/ \
C*

where both A and C have quota settings, that warn on non empty limbo list
is triggered for a cfs_rq of C, let's call it cfs_rq_c(and ignore the cpu
part of the cfs_rq for the sake of simpler representation).


I encountered a similar warning a while ago and fixed it. I have a question
I'd like to ask. tg_unthrottle_up(cfs_rq_C) calls enqueue_task_fair(p) to
enqueue a task, which requires that the runtime_remaining of task p's entire
task_group hierarchy be greater than 0.

In addition to the case you fixed above,
When bandwidth is running normally, Is it possible that there's a corner
case where cfs_A->runtime_remaining > 0, but cfs_B->runtime_remaining < 0
could trigger a similar warning?

Do you mean B also has quota set and cfs_B's runtime_remaining < 0?
In this case, B should be throttled and C is a descendent of B so should
also be throttled, i.e. C can't be unthrottled when B is in throttled
state. Do I understand you correctly?

Yes, both A and B have quota set.

Is there a possible corner case?
Asynchronous unthrottling causes other running entities to completely consume cfs_B->runtime_remaining (cfs_B->runtime_remaining < 0) but not completely consume cfs_A->runtime_remaining (cfs_A->runtime_remaining > 0) when we call unthrottle_cfs_rq(cfs_rq_A) .

When we unthrottle_cfs_rq(cfs_rq_A), cfs_A->runtime_remaining > 0, but if cfs_B->runtime_remaining < 0 at this time,
therefore, when enqueue_task_fair(p)->check_enqueue_throttle(cfs_rq_B)->throttle_cfs_rq(cfs_rq_B), an warnning may be triggered.

My core question is:
When we call unthrottle_cfs_rq(cfs_rq_A), we only check cfs_rq_A->runtime_remaining. However, enqueue_task_fair(p)->enqueue_entity(C->B->A)->check_enqueue_throttle() does require that the runtime_remaining of each task_group level of task p is greater than 0.

Can we guarantee this?

Thanks,
Hao


So, I previously tried to fix this issue using the following code, adding
the ENQUEUE_THROTTLE flag to ensure that tasks enqueued in
tg_unthrottle_up() aren't throttled.


Yeah I think this can also fix the warning.
I'm not sure if it is a good idea though, because on unthrottle, the
expectation is, this cfs_rq should have runtime_remaining > 0 and if
it's not the case, I think it is better to know why.

Thanks.