Re: [PATCH] sched: fair: Prevent negative lag increase during delayed dequeue

From: Vincent Guittot

Date: Tue Mar 31 2026 - 04:57:39 EST


On Mon, 30 Mar 2026 at 23:32, Shubhang Kaushik
<shubhang@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello Vincent,
>
> On Mon, 30 Mar 2026, Vincent Guittot wrote:
>
> > On Mon, 30 Mar 2026 at 16:06, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>
> >> On Fri, Mar 27, 2026 at 07:02:19PM +0100, Vincent Guittot wrote:
> >>> Delayed dequeue feature aims to reduce the negative lag of a dequeued task
> >>> while sleeping but it can happens that newly enqueued tasks will move
> >>> backward the avg vruntime and increase its negative lag.
> >>> When the delayed dequeued task wakes up, it has more neg lag compared to
> >>> being dequeued immediately or to other delayed or not tasks that have been
> >>> dequeued just before theses new enqueues.
> >>>
> >>> Ensure that the negative lag of a delayed dequeued task doesn't increase
> >>> during its delayed dequeued phase while waiting for its neg lag to
> >>> diseappear. Similarly, we remove any positive lag that the delayed
> >>> dequeued task could have gain during thsi period.
> >>
> >> *groan*, indeed!
> >>
> >>
> >>> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >>> ---
> >>> kernel/sched/fair.c | 9 +++++++--
> >>> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 226509231e67..efa9dfa8c583 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -5595,6 +5595,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> >>> if (sched_feat(DELAY_DEQUEUE) && delay &&
> >>> !entity_eligible(cfs_rq, se)) {
> >>> update_load_avg(cfs_rq, se, 0);
> >>> + update_entity_lag(cfs_rq, se);
> >>> set_delayed(se);
> >>> return false;
> >>> }
> >>> @@ -7089,12 +7090,16 @@ requeue_delayed_entity(struct sched_entity *se)
> >>> WARN_ON_ONCE(!se->on_rq);
> >>>
> >>> if (sched_feat(DELAY_ZERO)) {
> >>> + s64 vlag, prev_vlag = se->vlag;
> >>> update_entity_lag(cfs_rq, se);
> >>> - if (se->vlag > 0) {
> >>> + /* prev_vlag < 0 otherwise se would not be delayed */
> >>> + vlag = clamp(se->vlag, prev_vlag, 0);
> >>> +
> >>> + if (vlag != se->vlag) {
> >>> cfs_rq->nr_queued--;
> >>> if (se != cfs_rq->curr)
> >>> __dequeue_entity(cfs_rq, se);
> >>> - se->vlag = 0;
> >>> + se->vlag = vlag;
> >>> place_entity(cfs_rq, se, 0);
> >>> if (se != cfs_rq->curr)
> >>> __enqueue_entity(cfs_rq, se);
> >>
> >> So I think this should apply irrespective of DELAY_ZERO -- although
> >> since we default to that, it is the most relevant.
> >
> > I can try to link it only to delayed dequeue
> >
> >>
> >> Also, I'm thinking we might want a comment with this. We're bound to
> >> forget the details at some point.
> >
> > yes
> >
> >>
> >> Also, there is one other site that has DLAY_ZERO/clear_delayed(), does
> >> that want updating too?
> >
> > My assumption was that the other site happens when se is picked and
> > its lag is positive but we can sometimes force dequeue of delayed
> > entities.
> > Will add it
> >
> It seems finish_delayed_dequeue_entity() is another path where the delayed
> state is cleared. Would it be worth applying the same lag bounding there
> as well to keep the behavior consistent regardless of how the task exits
> the delayed state?

Yes that's the othersite I have in mind

>
> Regards,
> Shubhang Kaushik