[RFC PATCH 06/11] sched: dynamic min_vruntime

From: Mathieu Desnoyers
Date: Thu Aug 26 2010 - 14:15:09 EST


[ Impact: Fixes the large vruntime spread problems I identified last fall, but
might have bad side-effects on Xorg interactivity. See the INTERACTIVE
feature in a following patch that addresses this. ]

Push the scheduler dynamic min_vruntime upon deschedule. This ensures that the
following workload won't grow the spread to insanely large values over time
(give it 1-2 minutes), thus making the scheduler behave oddly with combined Xorg
and latency-sensitive threads: Xorg gets at the beginning of the spread, and the
latency-sensitive workloads get to be somewhere in the middle of the spread.

periodic-fork.sh:

#!/etc/sh

while ((1)); do
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
tac /etc/passwd > /dev/null;
sleep 1;
done

My test program is wakeup-latency.c, provided by Nokia originally. A 10ms timer
spawns a thread which reads the time, and shows a warning if the expected
deadline has been missed by too much. It also warns about timer overruns.
It's available at:

http://www.efficios.com/pub/elc2010/wakeup-latency-0.1.tar.bz2

With periodic-fork.sh running and Xorg, without the DYN_MIN_VRUNTIME feature,
but with the INTERACTIVE, INTERACTIVE_FORK_EXPEDITED, TIMER and
TIMER_FORK_EXPEDITED features enabled:

....
min priority: 0, max priority: 0
late by: 6765.8 µs
late by: 5536.1 µs
overruns: 1
late by: 12212.3 µs
late by: 5477.5 µs
overruns: 1
late by: 12259.3 µs
overruns: 1
late by: 12224.9 µs
overruns: 1
late by: 12214.3 µs
overruns: 1
late by: 12196.2 µs

maximum latency: 12259.3 µs
average latency: 46.4 µs
missed timer events: 5

Now same workload with the DYN_MIN_VRUNTIME feature enabled:

min priority: 0, max priority: 0

maximum latency: 2908.3 µs
average latency: 6.9 µs
missed timer events: 0

Inspired from a patch done by Peter Zijlstra.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
---
kernel/sched_fair.c | 15 ++++++++++-----
kernel/sched_features.h | 6 ++++++
2 files changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6-lttng.git/kernel/sched_fair.c
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_fair.c
+++ linux-2.6-lttng.git/kernel/sched_fair.c
@@ -301,9 +301,9 @@ static inline s64 entity_key(struct cfs_
return se->vruntime - cfs_rq->min_vruntime;
}

-static void update_min_vruntime(struct cfs_rq *cfs_rq)
+static void update_min_vruntime(struct cfs_rq *cfs_rq, unsigned long delta_exec)
{
- u64 vruntime = cfs_rq->min_vruntime;
+ u64 vruntime = cfs_rq->min_vruntime, new_vruntime;

if (cfs_rq->curr)
vruntime = cfs_rq->curr->vruntime;
@@ -319,7 +319,12 @@ static void update_min_vruntime(struct c
vruntime = min_vruntime(vruntime, se->vruntime);
}

- cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
+ new_vruntime = cfs_rq->min_vruntime;
+ if (sched_feat(DYN_MIN_VRUNTIME) && delta_exec)
+ new_vruntime += calc_delta_mine(delta_exec, NICE_0_LOAD,
+ &cfs_rq->load);
+
+ cfs_rq->min_vruntime = max_vruntime(new_vruntime, vruntime);
}

/*
@@ -513,7 +518,7 @@ __update_curr(struct cfs_rq *cfs_rq, str
delta_exec_weighted = calc_delta_fair(delta_exec, curr);

curr->vruntime += delta_exec_weighted;
- update_min_vruntime(cfs_rq);
+ update_min_vruntime(cfs_rq, delta_exec);
}

static void update_curr(struct cfs_rq *cfs_rq)
@@ -822,7 +827,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st
if (se != cfs_rq->curr)
__dequeue_entity(cfs_rq, se);
account_entity_dequeue(cfs_rq, se);
- update_min_vruntime(cfs_rq);
+ update_min_vruntime(cfs_rq, 0);

/*
* Normalize the entity after updating the min_vruntime because the
Index: linux-2.6-lttng.git/kernel/sched_features.h
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_features.h
+++ linux-2.6-lttng.git/kernel/sched_features.h
@@ -57,6 +57,12 @@ SCHED_FEAT(LB_SHARES_UPDATE, 1)
SCHED_FEAT(ASYM_EFF_LOAD, 1)

/*
+ * Push the min_vruntime spread floor value when descheduling a task. This
+ * ensures the spread does not grow beyond control.
+ */
+SCHED_FEAT(DYN_MIN_VRUNTIME, 0)
+
+/*
* Spin-wait on mutex acquisition when the mutex owner is running on
* another cpu -- assumes that when the owner is running, it will soon
* release the lock. Decreases scheduling overhead.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/