[patch] CFS scheduler: Completely Fair Scheduler / CONFIG_SCHED_FAIR

From: Ingo Molnar
Date: Sat Mar 17 2007 - 05:43:25 EST



* Nicholas Miell <nmiell@xxxxxxxxxxx> wrote:

> > this regression has to be fixed before RSDL can be merged, simply
> > because it is a pretty negative effect that goes beyond any of the
> > visible positive improvements that RSDL brings over the current
> > scheduler. If it is better to fix X, then X has to be fixed _first_,
> > at least in form of a prototype patch that can be _tested_, and then
> > the result has to be validated against RSDL.
>
> RSDL is, above all else, fair. Predictably so.

SCHED_BATCH (an existing feature of the current scheduler) is even
fairer and even more deterministic than RSDL, because it has _zero_
heuristics.

so how about the patch below (against current -git), which adds the
"CFS, Completely Fair Scheduler" feature? With that you could test your
upcoming X fixes. (it also adds /proc/sys/kernel/sched_fair so that you
can compare the fair scheduler against the vanilla scheduler.) It's very
simple and unintrusive:

4 files changed, 28 insertions(+)

furthermore, this is just the first step: if CONFIG_SCHED_FAIR becomes
widespread amongst distributions then we can remove the interactivity
estimator code altogether, and simplify the code quite a bit.

( NOTE: more improvements are possible as well: right now most
interactivity calculations are still done even if CONFIG_SCHED_FAIR is
enabled - that could be improved upon. )

Ingo

------------------------------>
Subject: [patch] CFS scheduler: Completely Fair Scheduler
From: Ingo Molnar <mingo@xxxxxxx>

add the CONFIG_SCHED_FAIR option (default: off): this turns the Linux
scheduler into a completely fair scheduler for SCHED_OTHER tasks: with
perfect roundrobin scheduling, fair distribution of timeslices combined
with no interactivity boosting and no heuristics.

a /proc/sys/kernel/sched_fair option is also available to turn
this behavior on/off.

if this option establishes itself amongst leading distributions then we
could in the future remove the interactivity estimator altogether.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
include/linux/sched.h | 1 +
kernel/Kconfig.preempt | 9 +++++++++
kernel/sched.c | 8 ++++++++
kernel/sysctl.c | 10 ++++++++++
4 files changed, 28 insertions(+)

Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -119,6 +119,7 @@ extern unsigned long avenrun[]; /* Load
load += n*(FIXED_1-exp); \
load >>= FSHIFT;

+extern unsigned int sched_fair;
extern unsigned long total_forks;
extern int nr_threads;
DECLARE_PER_CPU(unsigned long, process_counts);
Index: linux/kernel/Kconfig.preempt
===================================================================
--- linux.orig/kernel/Kconfig.preempt
+++ linux/kernel/Kconfig.preempt
@@ -63,3 +63,12 @@ config PREEMPT_BKL
Say Y here if you are building a kernel for a desktop system.
Say N if you are unsure.

+config SCHED_FAIR
+ bool "Completely Fair Scheduler"
+ help
+ This option turns the Linux scheduler into a completely fair
+ scheduler. User-space workloads will round-robin fairly, and
+ they have to be prioritized using nice levels.
+
+ Say N if you are unsure.
+
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -4040,6 +4040,10 @@ static inline struct task_struct *find_p
return pid ? find_task_by_pid(pid) : current;
}

+#ifdef CONFIG_SCHED_FAIR
+unsigned int sched_fair = 1;
+#endif
+
/* Actually do priority change: must hold rq lock. */
static void __setscheduler(struct task_struct *p, int policy, int prio)
{
@@ -4055,6 +4059,10 @@ static void __setscheduler(struct task_s
*/
if (policy == SCHED_BATCH)
p->sleep_avg = 0;
+#ifdef CONFIG_SCHED_FAIR
+ if (policy == SCHED_NORMAL && sched_fair)
+ p->sleep_avg = 0;
+#endif
set_load_weight(p);
}

Index: linux/kernel/sysctl.c
===================================================================
--- linux.orig/kernel/sysctl.c
+++ linux/kernel/sysctl.c
@@ -205,6 +205,16 @@ static ctl_table root_table[] = {
};

static ctl_table kern_table[] = {
+#ifdef CONFIG_SCHED_FAIR
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "sched_fair",
+ .data = &sched_fair,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
{
.ctl_name = KERN_PANIC,
.procname = "panic",
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/