Re: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle
From: Aaron Lu
Date: Fri Mar 14 2025 - 07:47:52 EST
Hi Prateek,
On Fri, Mar 14, 2025 at 03:56:26PM +0530, K Prateek Nayak wrote:
> Hello Aaron,
>
> On 3/14/2025 3:12 PM, Aaron Lu wrote:
> > > Then then if we pick a task from a throttled cfs_rq subtree, we can setup task work
> > > for it, so we don't botter with the delayed_dequeue task case that Prateek mentioned.
> > If we add a check point in pick time, maybe we can also avoid the check
> > in enqueue time. One thing I'm thinking is, for a task, it may be picked
> > multiple times with only a single enqueue so if we do the check in pick,
> > the overhead can be larger?
>
> I think it can be minimized to a good extent. Something like:
I see, thanks for the illustration.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d646451d617c..ba6571368840 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5942,6 +5942,9 @@ static inline bool task_has_throttle_work(struct task_struct *p)
> static inline void task_throttle_setup_work(struct task_struct *p)
> {
> + if (task_has_throttle_work(p))
> + return;
> +
> /*
> * Kthreads and exiting tasks don't return to userspace, so adding the
> * work is pointless
> @@ -5949,9 +5952,6 @@ static inline void task_throttle_setup_work(struct task_struct *p)
> if ((p->flags & (PF_EXITING | PF_KTHREAD)))
> return;
> - if (task_has_throttle_work(p))
> - return;
> -
> task_work_add(p, &p->sched_throttle_work, TWA_RESUME);
> }
> @@ -6000,12 +6000,6 @@ static int tg_throttle_down(struct task_group *tg, void *data)
> node = rb_next(node);
> }
> - /* curr is not in the timeline tree */
> - if (cfs_rq->curr && entity_is_task(cfs_rq->curr)) {
> - p = task_of(cfs_rq->curr);
> - task_throttle_setup_work(p);
> - }
> -
Should we also remove adding throttle work for those tasks in
cfs_rq->tasks_timeline?
> return 0;
> }
> @@ -6049,6 +6043,18 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
> SCHED_WARN_ON(cfs_rq->throttled_clock);
> if (cfs_rq->nr_queued)
> cfs_rq->throttled_clock = rq_clock(rq);
> +
> + /*
> + * If cfs_rq->curr is set, check if current task is queued
> + * and set up the throttle work proactively.
> + */
> + if (cfs_rq->curr) {
> + struct task_struct *p = rq->donor; /* scheduling context with proxy */
I'll have to check what rq->donor means.
I think the point is to proactively add throttle work for rq->curr if
rq->curr is in this throttled hierarchy? Because the only check point to
add throttle work will be at pick time and curr will probably not be
picked anytime soon.
Thanks,
Aaron
> +
> + if (task_on_rq_queued(p))
> + task_throttle_setup_work(p);
> + }
> +
> return;
> }
> @@ -8938,6 +8944,13 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> struct sched_entity *pse = &prev->se;
> struct cfs_rq *cfs_rq;
> + /*
> + * Check if throttle work needs to be setup when
> + * switching to a different task.
> + */
> + if (throttled_hierarchy(cfs_rq_of(se)))
> + task_throttle_setup_work(p);
> +
> while (!(cfs_rq = is_same_group(se, pse))) {
> int se_depth = se->depth;
> int pse_depth = pse->depth;
> @@ -13340,6 +13353,9 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
> account_cfs_rq_runtime(cfs_rq, 0);
> }
> + if (throttled_hierarchy(cfs_rq_of(se)))
> + task_throttle_setup_work(p);
> +
> __set_next_task_fair(rq, p, first);
> }
> --
>
> .. with the additional plumbing in place of course.
>
> --
> Thanks and Regards,
> Prateek
>