Re: [PATCH] sched/fair: Update zero_vruntime after clearing on_rq in dequeue_entity()

From: Vincent Guittot

Date: Mon Mar 23 2026 - 03:52:41 EST


On Thu, 19 Mar 2026 at 12:43, Zicheng Qu <quzicheng@xxxxxxxxxx> wrote:
>
> When dequeuing the current entity (cfs_rq->curr) in dequeue_entity(),
> the cfs_rq->zero_vruntime is updated via update_entity_lag() ->
> avg_vruntime() -> update_zero_vruntime() while curr->on_rq is still 1.
> This means the current entity is still included in the zero_vruntime
> calculation.

curr is not included in zero_vruntime but added when computing
avg_vruntime so zero_vruntime is not impacted when curr is dequeued

>
> However, immediately after this, curr->on_rq is set to 0, which should
> change the avg_vruntime() result. Without re-updating zero_vruntime, the
> stale value may be used in subsequent task selection paths:
>
> schedule() -> ... -> pick_task_fair() -> pick_next_entity() ->
> pick_eevdf() -> vruntime_eligible()
>
> If entity_tick() -> avg_vruntime() -> update_zero_vruntime() is not
> triggered in time between dequeue and the next pick, vruntime_eligible()
> may use an inaccurate cfs_rq->zero_vruntime. This can potentially cause
> all tasks to appear ineligible, leading to NULL pointer dereference.
>
> Add an explicit avg_vruntime(cfs_rq) call after clearing curr->on_rq to
> ensure cfs_rq->zero_vruntime is properly updated before the next pick.
>
> Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
> Signed-off-by: Zicheng Qu <quzicheng@xxxxxxxxxx>
> Signed-off-by: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
> ---
> kernel/sched/fair.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed..f8070767c2f4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5461,6 +5461,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> if (se != cfs_rq->curr)
> __dequeue_entity(cfs_rq, se);
> se->on_rq = 0;
> + /* update the cfs_rq->zero_vruntime again after curr->on_rq = 0 */
> + if (se == cfs_rq->curr)
> + avg_vruntime(cfs_rq);
> account_entity_dequeue(cfs_rq, se);
>
> /* return excess runtime on last dequeue */
> --
> 2.34.1
>