Re: [PATCH v2] sched: fair: Prevent negative lag increase during delayed dequeue

From: Peter Zijlstra

Date: Thu Apr 23 2026 - 05:42:25 EST


On Thu, Apr 23, 2026 at 09:28:22AM +0200, Vincent Guittot wrote:
> On Thu, 23 Apr 2026 at 00:20, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, Apr 22, 2026 at 04:28:28PM +0200, Peter Zijlstra wrote:
> >
> > > Let me ponder this a bit...
> >
> > Like this? Or am I still making a mess of things? AFAICT this is the
> > exact same as your initial version.
> >
> > ---
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 69361c63353a..24e8c78b110a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -847,13 +847,13 @@ static s64 entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 avrunt
> > * Similarly, check that the entity didn't gain positive lag when DELAY_ZERO
> > * is set.
> > *
> > - * Return true if the lag has been adjusted.
> > + * Return true if the lag of a delayed entity has been adjusted.
> > */
> > static __always_inline
> > bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > s64 vlag = entity_lag(cfs_rq, se, avg_vruntime(cfs_rq));
> > - bool ret;
> > + bool ret = false;
> >
> > WARN_ON_ONCE(!se->on_rq);
> >
> > @@ -862,8 +862,9 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > vlag = max(vlag, se->vlag);
> > if (sched_feat(DELAY_ZERO))
> > vlag = min(vlag, 0);
> > +
> > + ret = (vlag != se->vlag);
>
> No this is not enough.

Argh yes. I think I finally see. How about this then?

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69361c63353a..f4d1457d1837 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -847,13 +847,19 @@ static s64 entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 avrunt
* Similarly, check that the entity didn't gain positive lag when DELAY_ZERO
* is set.
*
- * Return true if the lag has been adjusted.
+ * Return true if the vlag has been modified. Specifically:
+ *
+ * se->vlag != avg_vruntime() - se->vruntime
+ *
+ * This can be due to clamping in entity_lag() or clamping due to
+ * sched_delayed. Either way, when vlag is modified and the entity is
+ * retained, the tree needs to be adjusted.
*/
static __always_inline
bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- s64 vlag = entity_lag(cfs_rq, se, avg_vruntime(cfs_rq));
- bool ret;
+ u64 avruntime = avg_vruntime(cfs_rq);
+ s64 vlag = entity_lag(cfs_rq, se, avruntime);

WARN_ON_ONCE(!se->on_rq);

@@ -863,10 +869,9 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
if (sched_feat(DELAY_ZERO))
vlag = min(vlag, 0);
}
- ret = (vlag == se->vlag);
se->vlag = vlag;

- return ret;
+ return avruntime - vlag != se->vruntime;
}

/*