Re: [PATCH] sched/fair: Call update_util_est() after dequeue_entities()

From: Qais Yousef

Date: Fri May 15 2026 - 23:08:31 EST


On 05/15/26 11:35, Tim Chen wrote:
> On Tue, 2026-05-12 at 13:46 +0100, Qais Yousef wrote:
> > update_util_est() reads task_util() at dequeue which is updated in
> > dequeue_entities(). To read the accurate util_avg at dequeue, make sure
> > to do the read after load_avg is updated in dequeue_entities().
> >
> > util_est for a periodic task before
> >
> > periodic-3114 util_est.enqueued running
> > ┌───────────────────────────────────────────────────────────────────────────────────────────────┐
> > 183┤ ▖▗ ▐▖ ▖ ▗▙ ▗ ▗▙▖▖ ▖▖ ▖ ▖▖ ▗ ▟ ▗▄▖ │
> > 139┤ ▐▛█▜▙▞▀▄▄▞▚▄▟█▞▙█▄▟▀▚▄▄▞▚▄▄▟▀▀▛▄▝▄▄▄▙█▛▛█▛▜▛▄▄▀▄█▙▛▛▛▙▄▀▄▄▖▜▄▟█▟▀▜▟▄▜▀▄▄▟▙▖ │
> > 95┤ ▐▀ ▘ ▝ ▝ ▝▘ ▘ ▘▘ ▝▘ ▝▘ ▝ ▝ ▀ │
> > │ ▛ │
> > 51┤ ▐▘ │
> > 7┤ ▖▗▗ ▗▄▐ │
> > └┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬┘
> > 0.00 0.65 1.30 1.96 2.61 3.26 3.91 4.57 5.22 5.87
> >
> > and after
> >
> > periodic-2977 util_est.enqueued running
> > ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
> > 157.0┤ ▙▄ ▗▄ ▗▄▄▄ ▗▄ ▗▄▄▄▗▄▄ ▗▄▄▖ ▄ ▄▄▄ ▄ ▄▖▖ ▄▄▄▄▄▖▖▝▙▄▄▄▄▄▄▖ ▗▄ │
> > 119.5┤ ▗▄▌▘▀▀ ▀▀▀ ▝▀▀▘▝▀▀▀ ▝▀▘ ▝▀▀▘ ▀▝▀▘▀▀▀▘▝▀▀▀▀▀▀▀▘▝▝▀▀ ▀ ▝▝▀ ▀ ▀▀▀▀ │
> > 82.0┤ ▟ │
> > │ ▌ │
> > 44.5┤ ▌ │
> > 7.0┤ ▗ ▗▖ ▌ │
> > └┬─────────┬─────────┬──────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┬┘
> > 0.00 0.65 1.30 1.95 2.60 3.25 3.90 4.56 5.21 5.86
> >
> > Note how the signal is noisier and can peak to 183 vs 157 now.
> >
> > Fixes: b55945c500c5 ("sched: Fix pick_next_task_fair() vs try_to_wake_up() race")
> > Signed-off-by: Qais Yousef <qyousef@xxxxxxxxxxx>
> > ---
> >
> > This is split from [1] series where I stumbled upon this problem. AFAICS it
> > needs backporting all the way to 6.12 LTS.
> >
> > [1] https://lore.kernel.org/lkml/20260504020003.71306-1-qyousef@xxxxxxxxxxx/
> >
> > kernel/sched/fair.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 728965851842..96ba97e5f4ae 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7401,6 +7401,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
> > */
> > static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > {
> > + int ret;
> > +
> > if (task_is_throttled(p)) {
> > dequeue_throttled_task(p, flags);
> > return true;
> > @@ -7409,8 +7411,9 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > if (!p->se.sched_delayed)
> > util_est_dequeue(&rq->cfs, p);
> >
> > + ret = dequeue_entities(rq, &p->se, flags);
> > util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
>
> I thought that util_est_update() was called intentionally before dequeue_entities
> to update the utilization of task p up to this time right
> before the dequeue. Then dequeue_entities() is called later
> with up to date task utilization estimate of p.

No. If you look at older versions of dequeue_task_fair() you'll see it was done
at the end.

util_est is a holding function, it should remember the last util_avg value at
dequeue, so the updates must happen first.

>
> Perhaps util_est_update() should be moved before
> util_est_dequeue() so the updated utilization of p
> is subtracted from the rq utilization.

We actually should subtract the old value always before updating it. The update
happens only at dequeue. My rampup multiplier patches introduces updates for
running tasks, but has to do the dance of subtract, update and re-add otherwise
you'll end up with weird util values at the rq.

>
> @@ -8002,10 +8002,10 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> return true;
> }
>
> + util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
> if (!p->se.sched_delayed)
> util_est_dequeue(&rq->cfs, p);
>
> - util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
> if (dequeue_entities(rq, &p->se, flags) < 0)
> return false;
>
>
> Tim
>
> > - if (dequeue_entities(rq, &p->se, flags) < 0)
> > + if (ret < 0)
> > return false;
> >
> > /*