Re: [PATCH] Staircase Scheduler v6.3 for 2.6.7-rc2
From: Phy Prabab
Date: Mon Jun 07 2004 - 15:23:56 EST
FYI:
I have had a chance to test this patch. I have a make
system that has presented 2.6 kernels with problems,
so I am using this as the test. Observations show
that turning off interactive is much more
deterministic:
2.6.7-rc2-bk8-s63:
echo 0 > /proc/sys/kernel/interactive
A: 35.57user 38.18system 1:20.28elapsed 91%CPU
B: 35.54user 38.40system 1:19.48elapsed 93%CPU
C: 35.48user 38.28system 1:20.94elapsed 91%CPU
2.6.7-rc2-bk8-s63:
A: 35.32user 38.51system 1:26.47elapsed 85%CPU
B: 35.43user 38.35system 1:20.79elapsed 91%CPU
C: 35.61user 38.23system 1:25.00elapsed 86%CPU
However, something is still slower than the 2.4.x
kernels:
2.4.23:
A: 28.32user 29.51system 1:01.17elapsed 93%CPU
B: 28.54user 29.40system 1:01.48elapsed 92%CPU
B: 28.23user 28.80system 1:00.21elapsed 94%CPU
Nice work, as I can now turn off some functionality
within the kernel that is causing me some slow down.
Thanks!
Phy
--- Con Kolivas <kernel@xxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> This is an update of the scheduler policy mechanism
> rewrite using the
> infrastructure of the current O(1) scheduler. Slight
> changes from the
> original design require a detailed description. The
> change to the original
> design has enabled all known corner cases to be
> abolished and cpu
> distribution to be much better maintained. It has
> proven to be stable in my
> testing and is ready for more widespread public
> testing now.
>
>
> Aims:
> - Interactive by design rather than have
> interactivity bolted on.
> - Good scalability.
> - Simple predictable design.
> - Maintain appropriate cpu distribution and
> fairness.
> - Low scheduling latency for normal policy tasks.
> - Low overhead.
> - Making renicing processes actually matter for CPU
> distribution (nice 0 gets
> 20 times what nice +20 gets)
> - Resistant to priority inversion
> - More forgiving of poor locking
> - Tunable for a server workload or computational
> tasks
>
>
> Description:
> - All tasks start at a dynamic priority based on
> their nice value. They run
> for one RR_INTERVAL (nominally set to 10ms) and then
> drop one stair
> (priority). If they sleep before running again they
> get to run for 2
> intervals before being demoted a priority and so on
> until they get all their
> intervals at their best priority: 20 intervals for
> nice 0; 1 interval for
> nice +19.
>
>
> - - The staircase elevation mechanism can be
> disabled and all tasks can simply
> descend stairs using the sysctl:
>
> echo 0 > /proc/sys/kernel/interactive
>
> this has the effect of maintaining cpu distribution
> much more strictly
> according to nice whereas the default mechanism
> allows bursts of cpu by
> interactive tasks before settling to appropriate cpu
> distribution.
>
>
> - - The time tasks are cpu bound can be increased by
> using the sysctl:
>
> echo 1 > /proc/sys/kernel/compute
>
> which extends the RR_INTERVAL to 100ms and disables
> the staircase elevation
> improving conditions for pure computational tasks by
> optimising cache
> benefits and decreasing context switching (gains
> another 1.5% on kernbench).
>
>
> Performance:
> - - All cpu throughput benchmarks show equivalent or
> better performance than
> mainline. Note that disabling the interactive
> setting actually _worsens_ some
> benchmarks because of their dependence on yield() so
> I don't recommend
> disabling it unless you do a comparison first.
> - - Interactivity is approximately equivalent to
> mainline 2.6 but with faster
> application startup and no known behavioural quirks.
>
>
> Comments and testing welcome.
>
> fs/proc/array.c | 2
> include/linux/sched.h | 8
> include/linux/sysctl.h | 2
> kernel/sched.c | 676
> +++++++++++++------------------------------------
> kernel/sysctl.c | 16 +
> 5 files changed, 212 insertions(+), 492
> deletions(-)
>
> Can be downloaded here:
>
http://ck.kolivas.org/patches/2.6/2.6.7-rc2/patch-2.6.7-rc2-s6.3
>
> and below
> Con
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
>
>
iD8DBQFAwzquZUg7+tp6mRURArBHAJ9SIBOKX6MYOdkJzdb+xRNnW82JQgCghLou
> wXhrRsBsfY2BIqbLT1tUWcs=
> =g1no
> -----END PGP SIGNATURE-----
> > diff -Naurp --exclude-from=dontdiff
> linux-2.6.7-rc2-base/fs/proc/array.c
> linux-2.6.7-rc2-s6.3/fs/proc/array.c
> --- linux-2.6.7-rc2-base/fs/proc/array.c 2004-05-31
> 21:29:15.000000000 +1000
> +++ linux-2.6.7-rc2-s6.3/fs/proc/array.c 2004-06-07
> 00:03:56.959133536 +1000
> @@ -155,7 +155,6 @@ static inline char *
> task_state(struct t
> read_lock(&tasklist_lock);
> buffer += sprintf(buffer,
> "State:\t%s\n"
> - "SleepAVG:\t%lu%%\n"
> "Tgid:\t%d\n"
> "Pid:\t%d\n"
> "PPid:\t%d\n"
> @@ -163,7 +162,6 @@ static inline char *
> task_state(struct t
> "Uid:\t%d\t%d\t%d\t%d\n"
> "Gid:\t%d\t%d\t%d\t%d\n",
> get_task_state(p),
> - (p->sleep_avg/1024)*100/(1020000000/1024),
> p->tgid,
> p->pid, p->pid ? p->real_parent->pid : 0,
> p->pid && p->ptrace ? p->parent->pid : 0,
> diff -Naurp --exclude-from=dontdiff
> linux-2.6.7-rc2-base/include/linux/sched.h
> linux-2.6.7-rc2-s6.3/include/linux/sched.h
> --- linux-2.6.7-rc2-base/include/linux/sched.h
> 2004-05-31 21:29:21.000000000 +1000
> +++ linux-2.6.7-rc2-s6.3/include/linux/sched.h
> 2004-06-07 00:10:37.450576503 +1000
> @@ -164,6 +164,7 @@ extern void show_stack(struct
> task_struc
>
> void io_schedule(void);
> long io_schedule_timeout(long timeout);
> +extern int interactive, compute;
>
> extern void cpu_init (void);
> extern void trap_init(void);
> @@ -394,14 +395,13 @@ struct task_struct {
> struct list_head run_list;
> prio_array_t *array;
>
> - unsigned long sleep_avg;
> - long interactive_credit;
> unsigned long long timestamp;
> - int activated;
> + unsigned long runtime, totalrun;
> + unsigned int deadline;
>
> unsigned long policy;
> cpumask_t cpus_allowed;
> - unsigned int time_slice, first_time_slice;
> + unsigned int slice, time_slice, first_time_slice;
>
> struct list_head tasks;
> struct list_head ptrace_children;
> diff -Naurp --exclude-from=dontdiff
> linux-2.6.7-rc2-base/include/linux/sysctl.h
> linux-2.6.7-rc2-s6.3/include/linux/sysctl.h
> --- linux-2.6.7-rc2-base/include/linux/sysctl.h
> 2004-05-31 21:29:21.000000000 +1000
> +++ linux-2.6.7-rc2-s6.3/include/linux/sysctl.h
> 2004-06-07 00:06:13.007895851 +1000
> @@ -133,6 +133,8 @@ enum
> KERN_NGROUPS_MAX=63, /* int: NGROUPS_MAX */
> KERN_SPARC_SCONS_PWROFF=64, /* int: serial console
> power-off halt */
> KERN_HZ_TIMER=65, /* int: hz timer on or off */
> + KERN_INTERACTIVE=66, /* interactive tasks to have
> cpu bursts */
> + KERN_COMPUTE=67, /* adjust timeslices for a
> compute server */
> };
>
>
> diff -Naurp --exclude-from=dontdiff
> linux-2.6.7-rc2-base/kernel/sched.c
> linux-2.6.7-rc2-s6.3/kernel/sched.c
> --- linux-2.6.7-rc2-base/kernel/sched.c 2004-05-31
> 21:29:24.000000000 +1000
> +++ linux-2.6.7-rc2-s6.3/kernel/sched.c 2004-06-07
> 00:36:43.382396246 +1000
> @@ -16,7 +16,10 @@
> * by Davide Libenzi, preemptible kernel bits by
> Robert Love.
> * 2003-09-03 Interactivity tuning by Con Kolivas.
> * 2004-04-02 Scheduler domains code by Nick
> Piggin
> - */
> + * 2004-03-19. New staircase scheduling policy by
> Con Kolivas with help
> + * from Zwane Mwaikambo and useful suggestions by
> + * William Lee Irwin III.
> +*/
>
> #include <linux/mm.h>
> #include <linux/module.h>
> @@ -66,8 +69,6 @@
> #define USER_PRIO(p) ((p)-MAX_RT_PRIO)
> #define TASK_USER_PRIO(p)
> USER_PRIO((p)->static_prio)
> #define MAX_USER_PRIO (USER_PRIO(MAX_PRIO))
> -#define AVG_TIMESLICE (MIN_TIMESLICE +
> ((MAX_TIMESLICE - MIN_TIMESLICE) *\
> - (MAX_PRIO-1-NICE_TO_PRIO(0))/(MAX_USER_PRIO -
> 1)))
>
> /*
> * Some helpers for converting nanosecond timing to
> jiffy resolution
> @@ -75,110 +76,18 @@
> #define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 /
> HZ))
> #define JIFFIES_TO_NS(TIME) ((TIME) * (1000000000 /
> HZ))
>
> -/*
> - * These are the 'tuning knobs' of the scheduler:
> - *
> - * Minimum timeslice is 10 msecs, default timeslice
> is 100 msecs,
> - * maximum timeslice is 200 msecs. Timeslices get
> refilled after
> - * they expire.
> - */
> -#define MIN_TIMESLICE ( 10 * HZ / 1000)
> -#define MAX_TIMESLICE (200 * HZ / 1000)
> -#define ON_RUNQUEUE_WEIGHT 30
> -#define CHILD_PENALTY 95
> -#define PARENT_PENALTY 100
> -#define EXIT_WEIGHT 3
> -#define PRIO_BONUS_RATIO 25
> -#define MAX_BONUS (MAX_USER_PRIO *
> PRIO_BONUS_RATIO / 100)
> -#define INTERACTIVE_DELTA 2
> -#define MAX_SLEEP_AVG (AVG_TIMESLICE * MAX_BONUS)
> -#define STARVATION_LIMIT (MAX_SLEEP_AVG)
> -#define NS_MAX_SLEEP_AVG
> (JIFFIES_TO_NS(MAX_SLEEP_AVG))
> -#define CREDIT_LIMIT 100
> -
> -/*
> - * If a task is 'interactive' then we reinsert it
> in the active
> - * array after it has expired its current
> timeslice. (it will not
> - * continue to run immediately, it will still
> roundrobin with
> - * other interactive tasks.)
> - *
> - * This part scales the interactivity limit
> depending on niceness.
> - *
> - * We scale it linearly, offset by the
> INTERACTIVE_DELTA delta.
> - * Here are a few examples of different nice
> levels:
> - *
> - * TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0]
> - * TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0]
> - * TASK_INTERACTIVE( 0): [1,1,1,1,0,0,0,0,0,0,0]
> - * TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0]
> - * TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0]
> - *
> - * (the X axis represents the possible -5 ... 0 ...
> +5 dynamic
> - * priority range a task can explore, a value of
> '1' means the
> - * task is rated interactive.)
> - *
> - * Ie. nice +19 tasks can never get 'interactive'
> enough to be
> - * reinserted into the active array. And only
> heavily CPU-hog nice -20
> - * tasks will be expired. Default nice 0 tasks are
> somewhere between,
> - * it takes some effort for them to get
> interactive, but it's not
> - * too hard.
> - */
> -
> -#define CURRENT_BONUS(p) \
> - (NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \
> - MAX_SLEEP_AVG)
> -
> -#ifdef CONFIG_SMP
> -#define TIMESLICE_GRANULARITY(p) (MIN_TIMESLICE * \
> - (1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) -
> 1)) * \
> - num_online_cpus())
> -#else
> -#define TIMESLICE_GRANULARITY(p) (MIN_TIMESLICE * \
> - (1 << (((MAX_BONUS - CURRENT_BONUS(p)) ? : 1) -
> 1)))
> -#endif
> -
> -#define SCALE(v1,v1_max,v2_max) \
> - (v1) * (v2_max) / (v1_max)
>
=== message truncated ===
__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/