Re: [PATCH v2 10/10] sched/eevdf: Move to a single runqueue
From: Vincent Guittot
Date: Thu May 21 2026 - 04:00:01 EST
On Thu, 21 May 2026 at 04:57, K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Vincent,
>
> On 5/20/2026 10:02 PM, Vincent Guittot wrote:
> > I finally fount the root cause of regression: the update of entity lag happened
> > after the task has been dequeued which screwed update_entity_lag():
>
> Great catch!
>
> >
> > update_entity_lag must be called after updating curr and cfs_rd and before
> > clearing on_rq
> >
> > With the fix below I'm back to original hackbench figures and maybe even a bit better.
> > I haven't checked shceduling latency yet
> >
> > ---
> > kernel/sched/fair.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 77d0e1937f2c..32fe57004f27 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5753,6 +5753,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> >
> > update_stats_dequeue_fair(cfs_rq, se, flags);
> >
> > + if (entity_is_task(se))
> > + update_entity_lag(&rq_of(cfs_rq)->cfs, se);
> > +
> > se->on_rq = 0;
>
> Ah! The curr->on_rq indicator changes here and we'll start ignoring it
> for avg_vruntime() calculation afterwards! Makes sense.
>
> > account_entity_dequeue(cfs_rq, se);
> >
> > @@ -7423,6 +7426,7 @@ static bool __dequeue_task(struct rq *rq, struct task_struct *p, int flags)
> > if (sched_feat(DELAY_DEQUEUE) && delay &&
> > !entity_eligible(cfs_rq, se)) {
>
> Does this need a update_curr() before checking entity_eligible()?
Yes we need to update curr first
>
> Currently these bits reside in dequeue_entity() and is always done after
> a update_curr(cfs_rq) but here we may need a:
>
> update_curr(task_cfs_rq(p)); /* to catch up h_curr's vruntime */
>
> Just doing it for task_cfs_rq(p) should be fine since we only have to
> catch up curr's vruntime - sum_w_vruntime and sum_weight at root cfs_rq
> should be stable for all the tasks on rb-tree.
>
> > update_load_avg(cfs_rq_of(se), se, 0);
> > + update_entity_lag(cfs_rq, se);
> > set_delayed(se);
> > return false;
> > }
> > @@ -7430,7 +7434,6 @@ static bool __dequeue_task(struct rq *rq, struct task_struct *p, int flags)
> >
> > dequeue_hierarchy(p, flags);
> >
> > - update_entity_lag(cfs_rq, se);
>
> If we decide to do a update_curr(task_cfs_rq(p)) at the beginning of
> __dequeue_task(), we can just move this to above dequeue_hierarchy()
> before se->on_rq indicators are modified.
>
> Thoughts?
yes it's doable, we will have a spurious update_curr in
dequeue_hierarchy but that will be a nop because of a null delta_exec
With flat hierarchy, vruntime and deadline are no longer linked to the
cfs hierarchy. A possibility could be to move the update of vruntime
and deadline outside but this is more complex because of delta_exec
The same apply for dl_server
>
> > if (sched_feat(PLACE_REL_DEADLINE) && !task_sleep) {
> > se->deadline -= se->vruntime;
> > se->rel_deadline = 1;
>
> --
> Thanks and Regards,
> Prateek
>