Re: [PATCH] sched/fair: Call update_util_est() after dequeue_entities()
From: John Stultz
Date: Tue May 12 2026 - 16:56:06 EST
On Tue, May 12, 2026 at 5:47 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> update_util_est() reads task_util() at dequeue which is updated in
> dequeue_entities(). To read the accurate util_avg at dequeue, make sure
> to do the read after load_avg is updated in dequeue_entities().
>
> util_est for a periodic task before
>
> periodic-3114 util_est.enqueued running
> ┌───────────────────────────────────────────────────────────────────────────────────────────────┐
> 183┤ ▖▗ ▐▖ ▖ ▗▙ ▗ ▗▙▖▖ ▖▖ ▖ ▖▖ ▗ ▟ ▗▄▖ │
> 139┤ ▐▛█▜▙▞▀▄▄▞▚▄▟█▞▙█▄▟▀▚▄▄▞▚▄▄▟▀▀▛▄▝▄▄▄▙█▛▛█▛▜▛▄▄▀▄█▙▛▛▛▙▄▀▄▄▖▜▄▟█▟▀▜▟▄▜▀▄▄▟▙▖ │
> 95┤ ▐▀ ▘ ▝ ▝ ▝▘ ▘ ▘▘ ▝▘ ▝▘ ▝ ▝ ▀ │
> │ ▛ │
> 51┤ ▐▘ │
> 7┤ ▖▗▗ ▗▄▐ │
> └┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬──────────┬─────────┬┘
> 0.00 0.65 1.30 1.96 2.61 3.26 3.91 4.57 5.22 5.87
>
> and after
>
> periodic-2977 util_est.enqueued running
> ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
> 157.0┤ ▙▄ ▗▄ ▗▄▄▄ ▗▄ ▗▄▄▄▗▄▄ ▗▄▄▖ ▄ ▄▄▄ ▄ ▄▖▖ ▄▄▄▄▄▖▖▝▙▄▄▄▄▄▄▖ ▗▄ │
> 119.5┤ ▗▄▌▘▀▀ ▀▀▀ ▝▀▀▘▝▀▀▀ ▝▀▘ ▝▀▀▘ ▀▝▀▘▀▀▀▘▝▀▀▀▀▀▀▀▘▝▝▀▀ ▀ ▝▝▀ ▀ ▀▀▀▀ │
> 82.0┤ ▟ │
> │ ▌ │
> 44.5┤ ▌ │
> 7.0┤ ▗ ▗▖ ▌ │
> └┬─────────┬─────────┬──────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┬┘
> 0.00 0.65 1.30 1.95 2.60 3.25 3.90 4.56 5.21 5.86
>
> Note how the signal is noisier and can peak to 183 vs 157 now.
>
> Fixes: b55945c500c5 ("sched: Fix pick_next_task_fair() vs try_to_wake_up() race")
> Signed-off-by: Qais Yousef <qyousef@xxxxxxxxxxx>
> ---
>
> This is split from [1] series where I stumbled upon this problem. AFAICS it
> needs backporting all the way to 6.12 LTS.
>
> [1] https://lore.kernel.org/lkml/20260504020003.71306-1-qyousef@xxxxxxxxxxx/
>
> kernel/sched/fair.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 728965851842..96ba97e5f4ae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7401,6 +7401,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
> */
> static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> {
> + int ret;
> +
> if (task_is_throttled(p)) {
> dequeue_throttled_task(p, flags);
> return true;
> @@ -7409,8 +7411,9 @@ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> if (!p->se.sched_delayed)
> util_est_dequeue(&rq->cfs, p);
>
> + ret = dequeue_entities(rq, &p->se, flags);
> util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
> - if (dequeue_entities(rq, &p->se, flags) < 0)
Hrm, so the Sashiko tool raised a reasonable concern on the earlier
version of this:
https://sashiko.dev/#/patchset/20260504020003.71306-1-qyousef%40layalina.io?part=12
Specifically, we shouldn't reference p after dequeue_entities() or we
risk racing with it being woken up, running, and maybe exiting on
another cpu.
And this moves the util_est_updat() call to after dequeue finishes.
Maybe there's some way to have util_est_update() compensate for the
unfinished accounting that will be done in dequeue_entities()?
thanks
-john