Re: [PATCH v2 10/10] sched/eevdf: Move to a single runqueue
From: K Prateek Nayak
Date: Wed May 20 2026 - 22:57:27 EST
Hello Vincent,
On 5/20/2026 10:02 PM, Vincent Guittot wrote:
> I finally fount the root cause of regression: the update of entity lag happened
> after the task has been dequeued which screwed update_entity_lag():
Great catch!
>
> update_entity_lag must be called after updating curr and cfs_rd and before
> clearing on_rq
>
> With the fix below I'm back to original hackbench figures and maybe even a bit better.
> I haven't checked shceduling latency yet
>
> ---
> kernel/sched/fair.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 77d0e1937f2c..32fe57004f27 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5753,6 +5753,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>
> update_stats_dequeue_fair(cfs_rq, se, flags);
>
> + if (entity_is_task(se))
> + update_entity_lag(&rq_of(cfs_rq)->cfs, se);
> +
> se->on_rq = 0;
Ah! The curr->on_rq indicator changes here and we'll start ignoring it
for avg_vruntime() calculation afterwards! Makes sense.
> account_entity_dequeue(cfs_rq, se);
>
> @@ -7423,6 +7426,7 @@ static bool __dequeue_task(struct rq *rq, struct task_struct *p, int flags)
> if (sched_feat(DELAY_DEQUEUE) && delay &&
> !entity_eligible(cfs_rq, se)) {
Does this need a update_curr() before checking entity_eligible()?
Currently these bits reside in dequeue_entity() and is always done after
a update_curr(cfs_rq) but here we may need a:
update_curr(task_cfs_rq(p)); /* to catch up h_curr's vruntime */
Just doing it for task_cfs_rq(p) should be fine since we only have to
catch up curr's vruntime - sum_w_vruntime and sum_weight at root cfs_rq
should be stable for all the tasks on rb-tree.
> update_load_avg(cfs_rq_of(se), se, 0);
> + update_entity_lag(cfs_rq, se);
> set_delayed(se);
> return false;
> }
> @@ -7430,7 +7434,6 @@ static bool __dequeue_task(struct rq *rq, struct task_struct *p, int flags)
>
> dequeue_hierarchy(p, flags);
>
> - update_entity_lag(cfs_rq, se);
If we decide to do a update_curr(task_cfs_rq(p)) at the beginning of
__dequeue_task(), we can just move this to above dequeue_hierarchy()
before se->on_rq indicators are modified.
Thoughts?
> if (sched_feat(PLACE_REL_DEADLINE) && !task_sleep) {
> se->deadline -= se->vruntime;
> se->rel_deadline = 1;
--
Thanks and Regards,
Prateek