Re: [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix

From: Vincent Guittot

Date: Wed Apr 01 2026 - 10:58:45 EST

On Wed, 1 Apr 2026 at 15:24, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> John reported that stress-ng-yield could make his machine unhappy and
> managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
> zero_vruntime tracking").
>
> The combination of yield and that commit was specific enough to
> hypothesize the following scenario:
>
> Suppose we have 2 runnable tasks, both doing yield. Then one will be
> eligible and one will not be, because the average position must be in
> between these two entities.
>
> Therefore, the runnable task will be eligible, and be promoted a full
> slice (all the tasks do is yield after all). This causes it to jump over
> the other task and now the other task is eligible and current is no
> longer. So we schedule.
>
> Since we are runnable, there is no {de,en}queue. All we have is the
> __{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
> fingered commit, those two no longer move zero_vruntime.
>
> All that moves zero_vruntime are tick and full {de,en}queue.
>
> This means, that if the two tasks playing leapfrog can reach the
> critical speed to reach the overflow point inside one tick's worth of
> time, we're up a creek.
>
> Additionally, when multiple cgroups are involved, there is no guarantee
> the tick will in fact hit every cgroup in a timely manner. Statistically
> speaking it will, but that same statistics does not rule out the
> possibility of one cgroup not getting a tick for a significant amount of
> time -- however unlikely.
>
> Therefore, just like with the yield() case, force an update at the end
> of every slice. This ensures the update is never more than a single
> slice behind and the whole thing is within 2 lag bounds as per the
> comment on entity_key().
>
> Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
> Reported-by: John Stultz <jstultz@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> Tested-by: John Stultz <jstultz@xxxxxxxxxx>

Reviewed-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>

> ---
> kernel/sched/fair.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -707,7 +707,7 @@ void update_zero_vruntime(struct cfs_rq
> * Called in:
> * - place_entity() -- before enqueue
> * - update_entity_lag() -- before dequeue
> - * - entity_tick()
> + * - update_deadline() -- slice expiration
> *
> * This means it is one entry 'behind' but that puts it close enough to where
> * the bound on entity_key() is at most two lag bounds.
> @@ -1131,6 +1131,7 @@ static bool update_deadline(struct cfs_r
> * EEVDF: vd_i = ve_i + r_i / w_i
> */
> se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
> + avg_vruntime(cfs_rq);
>
> /*
> * The task has consumed its request, reschedule.
> @@ -5593,11 +5594,6 @@ entity_tick(struct cfs_rq *cfs_rq, struc
> update_load_avg(cfs_rq, curr, UPDATE_TG);
> update_cfs_group(curr);
>
> - /*
> - * Pulls along cfs_rq::zero_vruntime.
> - */
> - avg_vruntime(cfs_rq);
> -
> #ifdef CONFIG_SCHED_HRTICK
> /*
> * queued ticks are scheduled to match the slice, so don't bother
> @@ -9128,7 +9124,7 @@ static void yield_task_fair(struct rq *r
> */
> if (entity_eligible(cfs_rq, se)) {
> se->vruntime = se->deadline;
> - se->deadline += calc_delta_fair(se->slice, se);
> + update_deadline(cfs_rq, se);
> }
> }
>
>
>