Re: [PATCH] sched/fair: use rq_clock_task() in update_tg_load_avg() rate-limit
From: Vincent Guittot
Date: Wed May 27 2026 - 08:22:18 EST
On Wed, 27 May 2026 at 13:52, Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> On Wed, 2026-05-27 at 10:17 +0200, Vincent Guittot wrote:
> > On Wed, 27 May 2026 at 09:37, Vincent Guittot
> > <vincent.guittot@xxxxxxxxxx> wrote:
> > >
> > > On Wed, 27 May 2026 at 03:32, Rik van Riel <riel@xxxxxxxxxxx>
> > > wrote:
> > > >
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -4429,8 +4429,15 @@ static inline void
> > > > update_tg_load_avg(struct cfs_rq *cfs_rq)
> > > > /*
> > > > * For migration heavy workloads, access to tg->load_avg
> > > > can be
> > > > * unbound. Limit the update rate to at most once per ms.
> > > > - */
> > > > - now = sched_clock_cpu(cpu_of(rq_of(cfs_rq)));
> > > > + *
> > > > + * The enclosing PELT update paths always hold rq->lock
> > > > and have
> > > > + * called update_rq_clock(rq) within microseconds, so rq-
> > > > >clock_task
> > > > + * is fresh. Use it instead of sched_clock_cpu() to
> > > > avoid an rdtsc
> > > > + * (plus pipeline serialisation) per call -- this
> > > > function is invoked
> > > > + * once per leaf cfs_rq in __update_blocked_fair(), so on
> > > > hosts with
> > > > + * many cgroups the rdtsc cost dominates the rate-limit
> > > > check itself.
> > > > + */
> > > > + now = rq_clock_task(rq_of(cfs_rq));
> >
> > Why not using rq_clock() ? This removes rdtsc call and still move
> > forward whatever the irq pressure
>
> Good idea, that does seem like a better clock to use!
>
> Does that also automatically eliminate the worry about
> task migration, since update_rq_clock() gets called
> whenever a task is attached, rqs get attached/detached,
> and in various other points including the load balancing
> code?
Yes, we move forward and can't get stuck by irq steal time with
rq_clock like the current implementation with the benefit of not
calling sched_clock_cpu() multiple times for the same scheduling event
>
>
> --
> All Rights Reversed.