Re: [External] Re: [RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle
From: Aaron Lu
Date: Fri Mar 14 2025 - 05:50:48 EST
On Fri, Mar 14, 2025 at 09:33:10AM +0530, K Prateek Nayak wrote:
> Hello Aaron,
>
> On 3/13/2025 12:51 PM, Aaron Lu wrote:
> > If a task is migrated to a new cpu, it is possible this task is not
> > throttled but the new cfs_rq is throttled or vice vesa. Take care of
> > these situations in enqueue path.
> >
> > Note that we can't handle this in migrate_task_rq_fair() because there,
> > the dst cpu's rq lock is not held and things like checking if the new
> > cfs_rq needs throttle can be racy.
> >
> > Signed-off-by: Aaron Lu <ziqianlu@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4a95fe3785e43..9e036f18d73e6 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7051,6 +7051,23 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> > assert_list_leaf_cfs_rq(rq);
> >
> > hrtick_update(rq);
> > +
> > + if (!cfs_bandwidth_used())
> > + return;
> > +
> > + /*
> > + * This is for migrate_task_rq_fair(): the new_cpu's rq lock is not held
> > + * in migrate_task_rq_fair() so we have to do these things in enqueue
> > + * time when the dst cpu's rq lock is held. Doing this check in enqueue
> > + * time also takes care of newly woken up tasks, e.g. a task wakes up
> > + * into a throttled cfs_rq.
> > + *
> > + * It's possible the task has a throttle work added but this new cfs_rq
> > + * is not in throttled hierarchy but that's OK, throttle_cfs_rq_work()
> > + * will take care of it.
> > + */
> > + if (throttled_hierarchy(cfs_rq_of(&p->se)))
> > + task_throttle_setup_work(p);
>
> Any reason we can't move this to somewhere towards the top?
> throttled_hierarchy() check should be cheap enough and we probably don't
> need the cfs_bandwidth_used() guarding check unless there are other
> concerns that I may have missed.
I didn't realize the delayed dequeue case so I placed this at bottom,
but as you have mentioned, for delayed dequeue tasks that gets
re-queued, this has to be on top.
Will change it to top in next version.
Thanks!