Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle
From: Aaron Lu
Date: Fri Mar 14 2025 - 05:43:04 EST
On Fri, Mar 14, 2025 at 04:39:41PM +0800, Chengming Zhou wrote:
> On 2025/3/13 15:21, Aaron Lu wrote:
> > From: Valentin Schneider <vschneid@xxxxxxxxxx>
> >
> > Once a cfs_rq gets throttled, for all tasks belonging to this cfs_rq,
> > add a task work to them so that when those tasks return to user, the
> > actual throttle/dequeue can happen.
> >
> > Note that since the throttle/dequeue always happens on a task basis when
> > it returns to user, it's no longer necessary for check_cfs_rq_runtime()
> > to return a value and pick_task_fair() acts differently according to that
> > return value, so check_cfs_rq_runtime() is changed to not return a
> > value.
>
> Previously with the per-cfs_rq throttling, we use update_curr() -> put() path
> to throttle the cfs_rq and dequeue it from the cfs_rq tree.
>
> Now with your per-task throttling, maybe things can become simpler. That we
> can just throttle_cfs_rq() (cfs_rq subtree) when curr accouting to mark these
> throttled.
Do I understand correctly that now in throttle_cfs_rq(), we just mark
this hierarchy as throttled, but do not add any throttle work to these
tasks in this hierarchy and leave the throttle work add job to pick
time?
> Then then if we pick a task from a throttled cfs_rq subtree, we can setup task work
> for it, so we don't botter with the delayed_dequeue task case that Prateek mentioned.
If we add a check point in pick time, maybe we can also avoid the check
in enqueue time. One thing I'm thinking is, for a task, it may be picked
multiple times with only a single enqueue so if we do the check in pick,
the overhead can be larger?
> WDYT?
Thanks for your suggestion. I'll try this approach and see how it turned
out.