[RFC 03/11] sched/fair: Only use time once
From: Peter Zijlstra
Date: Wed May 30 2018 - 10:39:00 EST
The goal is to not spend more time scanning for idle CPUs than we're
idle for. Otherwise we're inhibiting work.
This means that we need to consider the cost over all the wakeups
between consequtive idle periods.
Combined AGE+ONCE work better than the old code:
ORIG
1: 0.559639567 seconds time elapsed ( +- 1.44% )
2: 0.630091207 seconds time elapsed ( +- 2.93% )
5: 2.329768398 seconds time elapsed ( +- 1.21% )
10: 3.920248646 seconds time elapsed ( +- 2.39% )
20: 6.501776759 seconds time elapsed ( +- 1.02% )
40: 10.482109619 seconds time elapsed ( +- 2.16% )
AGE+ONCE
1: 0.546238431 seconds time elapsed ( +- 0.84% )
2: 0.620581405 seconds time elapsed ( +- 1.26% )
5: 2.161288964 seconds time elapsed ( +- 1.90% )
10: 3.514636966 seconds time elapsed ( +- 1.82% )
20: 6.228234657 seconds time elapsed ( +- 0.67% )
40: 9.755615438 seconds time elapsed ( +- 2.20% )
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 15 +++++++++++++++
kernel/sched/features.h | 1 +
2 files changed, 16 insertions(+)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6429,6 +6429,21 @@ static int select_idle_cpu(struct task_s
}
time = local_clock() - time;
+
+ if (sched_feat(SIS_ONCE)) {
+ struct rq *this_rq = this_rq();
+
+ /*
+ * We need to consider the cost of all wakeups between
+ * consequtive idle periods. We can only use the predicted
+ * idle time once.
+ */
+ if (this_rq->wake_avg > time)
+ this_rq->wake_avg -= time;
+ else
+ this_rq->wake_avg = 0;
+ }
+
time = div_u64(time, loops);
cost = this_sd->avg_scan_cost;
delta = (s64)(time - cost) / 8;
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -59,6 +59,7 @@ SCHED_FEAT(SIS_AVG_CPU, false)
SCHED_FEAT(SIS_PROP, true)
SCHED_FEAT(SIS_AGE, true)
+SCHED_FEAT(SIS_ONCE, true)
/*
* Issue a WARN when we do multiple update_rq_clock() calls