Re: [RFC] Documentation/scheduler/schedutil.txt

From: Peter Zijlstra
Date: Fri Nov 20 2020 - 03:57:25 EST


On Fri, Nov 20, 2020 at 08:55:27AM +0100, Peter Zijlstra wrote:
> - In saturated scenarios task movement will cause some transient dips,
> suppose we have a CPU saturated with 4 tasks, then when we migrate a task
> to an idle CPU, the old CPU will have a 'running' value of 0.75 while the
> new CPU will gain 0.25. This is inevitable and time progression will
> correct this. XXX do we still guarantee f_max due to no idle-time?

Do we want something like this? Is the 1.5 threshold sane? (it's been too
long since I looked at actual numbers here)

---

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 68d369cba9e4..f0bed8902c40 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -90,3 +90,4 @@ SCHED_FEAT(WA_BIAS, true)
*/
SCHED_FEAT(UTIL_EST, true)
SCHED_FEAT(UTIL_EST_FASTUP, true)
+SCHED_FEAT(UTIL_SAT, true)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 590e6f27068c..bf70e5ed8ba6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2593,10 +2593,17 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
return READ_ONCE(rq->avg_dl.util_avg);
}

+#define RUNNABLE_SAT (SCHED_CAPACITY_SCALE + SCHED_CAPACITY_SCALE/2)
+
static inline unsigned long cpu_util_cfs(struct rq *rq)
{
unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);

+ if (sched_feat(UTIL_SAT)) {
+ if (READ_ONCE(rq->cfs.avg.runnable_avg) > RUNNABLE_SAT)
+ return SCHED_CAPACITY_SCALE;
+ }
+
if (sched_feat(UTIL_EST)) {
util = max_t(unsigned long, util,
READ_ONCE(rq->cfs.avg.util_est.enqueued));