Re: [PATCH] sched/fair: Update zero_vruntime after clearing on_rq in dequeue_entity()
From: Vincent Guittot
Date: Mon Mar 23 2026 - 06:41:32 EST
On Mon, 23 Mar 2026 at 10:57, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 23, 2026 at 08:52:21AM +0100, Vincent Guittot wrote:
> > On Thu, 19 Mar 2026 at 12:43, Zicheng Qu <quzicheng@xxxxxxxxxx> wrote:
> > >
> > > When dequeuing the current entity (cfs_rq->curr) in dequeue_entity(),
> > > the cfs_rq->zero_vruntime is updated via update_entity_lag() ->
> > > avg_vruntime() -> update_zero_vruntime() while curr->on_rq is still 1.
> > > This means the current entity is still included in the zero_vruntime
> > > calculation.
> >
> > curr is not included in zero_vruntime but added when computing
> > avg_vruntime so zero_vruntime is not impacted when curr is dequeued
>
> It is, we explicitly add curr back in.
yeah, i should have taken one more coffee before replying
>
> > > However, immediately after this, curr->on_rq is set to 0, which should
> > > change the avg_vruntime() result. Without re-updating zero_vruntime, the
> > > stale value may be used in subsequent task selection paths:
> > >
> > > schedule() -> ... -> pick_task_fair() -> pick_next_entity() ->
> > > pick_eevdf() -> vruntime_eligible()
> > >
> > > If entity_tick() -> avg_vruntime() -> update_zero_vruntime() is not
> > > triggered in time between dequeue and the next pick, vruntime_eligible()
> > > may use an inaccurate cfs_rq->zero_vruntime. This can potentially cause
> > > all tasks to appear ineligible, leading to NULL pointer dereference.
>
> This makes no sense.
>
> One entity worth of vruntime should not affect things to the point of
> overrun. Yes, it is true that zero_vruntime != avg_vruntime() right
> after a dequeue, but that doesn't matter.
>
> vruntime_eligible() does the same math that avg_vruntime() does and
> takes this difference into account.
>
> As long as zero_vruntime is close 'enough' to avg_vruntime, all the
> deltas are small and nothing overflows.
>