Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle

From: Chengming Zhou
Date: Fri Mar 14 2025 - 07:07:46 EST


On 2025/3/14 17:42, Aaron Lu wrote:
On Fri, Mar 14, 2025 at 04:39:41PM +0800, Chengming Zhou wrote:
On 2025/3/13 15:21, Aaron Lu wrote:
From: Valentin Schneider <vschneid@xxxxxxxxxx>

Once a cfs_rq gets throttled, for all tasks belonging to this cfs_rq,
add a task work to them so that when those tasks return to user, the
actual throttle/dequeue can happen.

Note that since the throttle/dequeue always happens on a task basis when
it returns to user, it's no longer necessary for check_cfs_rq_runtime()
to return a value and pick_task_fair() acts differently according to that
return value, so check_cfs_rq_runtime() is changed to not return a
value.

Previously with the per-cfs_rq throttling, we use update_curr() -> put() path
to throttle the cfs_rq and dequeue it from the cfs_rq tree.

Now with your per-task throttling, maybe things can become simpler. That we
can just throttle_cfs_rq() (cfs_rq subtree) when curr accouting to mark these
throttled.

Do I understand correctly that now in throttle_cfs_rq(), we just mark
this hierarchy as throttled, but do not add any throttle work to these
tasks in this hierarchy and leave the throttle work add job to pick
time?

Right, we can move throttle_cfs_rq() forward to the curr accouting time, which
just mark these throttled.

And move setup_task_work() afterward to the pick task time, which make that task
dequeue when ret2user.


Then then if we pick a task from a throttled cfs_rq subtree, we can setup task work
for it, so we don't botter with the delayed_dequeue task case that Prateek mentioned.

If we add a check point in pick time, maybe we can also avoid the check
in enqueue time. One thing I'm thinking is, for a task, it may be picked
multiple times with only a single enqueue so if we do the check in pick,
the overhead can be larger?

As Prateek already mentioned, this check cost is negligeable.


WDYT?

Thanks for your suggestion. I'll try this approach and see how it turned
out.