Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

From: Peter Zijlstra
Date: Fri Apr 26 2024 - 05:34:03 EST


On Thu, Apr 25, 2024 at 01:49:49PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 25, 2024 at 12:42:20PM +0200, Peter Zijlstra wrote:
>
> > > I wonder if the delayed dequeue logic is having an unwanted effect on the calculation of
> > > utilization/load of the runqueue and, as a consequence, we're scheduling things to run on
> > > higher OPP's in the big cores, leading to poor decisions for energy efficiency.
> >
> > Notably util_est_update() gets delayed. Given we don't actually do an
> > enqueue when a delayed task gets woken, it didn't seem to make sense to
> > update that sooner.
>
> The PELT runnable values will be inflated because of delayed dequeue.
> cpu_util() uses those in the @boost case, and as such this can indeed
> affect things.
>
> This can also slightly affect the cgroup case, but since the delay goes
> away as contention goes away, and the cgroup case must already assume
> worst case overlap, this seems limited.
>
> /me goes ponder things moar.

First order approximation of a fix would be something like the totally
untested below I suppose...

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cfd1fd188d29..f3f70b5adca0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5391,6 +5391,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (cfs_rq->next == se)
cfs_rq->next = NULL;
se->sched_delayed = 1;
+ update_load_avg(cfs_rq, se, 0);
return false;
}
}
@@ -6817,6 +6818,7 @@ requeue_delayed_entity(struct sched_entity *se)
}

se->sched_delayed = 0;
+ update_load_avg(qcfs_rq, se, 0);
}

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d07a3b98f1fb..d16529613123 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -810,6 +810,9 @@ static inline void se_update_runnable(struct sched_entity *se)

static inline long se_runnable(struct sched_entity *se)
{
+ if (se->sched_delayed)
+ return false;
+
if (entity_is_task(se))
return !!se->on_rq;
else
@@ -823,6 +826,9 @@ static inline void se_update_runnable(struct sched_entity *se) {}

static inline long se_runnable(struct sched_entity *se)
{
+ if (se->sched_delayed)
+ return false;
+
return !!se->on_rq;
}
#endif