Re: [patch] CFS scheduler, -v18

From: Ingo Molnar
Date: Tue Jun 26 2007 - 04:39:20 EST



* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> So I locally generated the diff to take -mm up to the above version of
> CFS.

thx. I released a diff against mm2:

http://people.redhat.com/mingo/cfs-scheduler/sched-cfs-v2.6.22-rc4-mm2-v18.patch

but indeed the -git diff serves you better if you updated -mm to Linus'
latest.

firstly, thanks a ton for your review feedback!

> - sys_sched_yield_to() went away? I guess I missed that.

yep. Nobody tried it and sent any feedback on it, it was causing
patch-logistical complications both in -mm and for packagers that bundle
CFS (the experimental-schedulers site has a CFS repo and Fedora rawhide
started carrying CFS recently as well), and i dont really agree with
adding yet another yield interface anyway. So we can and should do this
independently of CFS.

> - Curious. the simplification of task_tick_rt() seems to go only
> halfway. Could do
>
> if (p->policy != SCHED_RR)
> return;
>
> if (--p->time_slice)
> return;
>
> /* stuff goes here */

yeah. I have fixed it in my v19 tree for it to look like you suggest.

> - dud macro:
>
> #define is_rt_policy(p) ((p) == SCHED_FIFO || (p) == SCHED_RR)
>
> It evaluates its arg twice and could and should be coded in C.
>
> There are a bunch of other don't-need-to-be-implemented-as-a-macro
> macros around there too. Generally, I suggest you review all the
> patchset for macros-which-don't-need-to-be-macros.

yep, fixed. (is a historic macro)

> - Extraneous newline:
>
> enum cpu_idle_type
> {

fixed. (is a pre-existing enum)

> - Style thing:
>
> struct sched_entity {

> u64 sleep_start, sleep_start_fair;

fixed.

> - None of these fields have comments describing what they do ;)

one of them has ;-) Will fill this in.

> - __exit_signal() does apparently-unlocked 64-bit arith. Is there
> some implicit locking here or do we not care about the occasional
> race-induced inaccuracy?

do you mean the tsk->se.sum_exec_runtime addition, etc? That runs with
interrupts disabled so sum_sched_runtime is protected.

> (ditto, lots of places, I expect)

which places do you mean?

> (Gee, there's shitloads of 64-bit stuff in there. Does it all
> _really_ need to be 64-bit on 32-bit?)

yes - CFS is fundamentally designed for 64-bit, with still pretty OK
arithmetics performance for 32-bit.

> - weight_s64() (what does this do?) looks too big to inline on 32-bit.

ok, i've uninlined it.

> - update_stats_enqueue() looks too big to inline even on 64-bit.

done.

> - Overall, this change is tremendously huge for something which is
> supposedly ready-to-merge. [...]

hey, that's not fair, your review comments just made it 10K larger ;-)

> [...] Looks like a lot of that is the sched_entity conversion, but
> afaict there's quite a lot besides.
>
> - Should "4" in
>
> (sysctl_sched_features & 4)
>
> be enumerated?

yep, done.

> - Maybe even __check_preempt_curr_fair() is too porky to inline.

yep - undone.

> - There really is an astonishing amount of 64-bit arith in here...
>
> - Some opportunities for useful comments have been missed ;)
>
> #define NICE_0_LOAD SCHED_LOAD_SCALE
> #define NICE_0_SHIFT SCHED_LOAD_SHIFT
>
> <wonders what these mean>

SCHED_LOAD_SCALE is the smpnice stuff. CFS reuses that and also makes it
clear via this define that a nice-0 task has a 'load' contribution to
the CPU as of NICE_0_LOAD. Sometimes, when doing smpnice load-balancing
calculations we want to use 'SCHED_LOAD_SCALE', sometimes we want to
stress it's NICE_0_LOAD.

> - Should any of those new 64-bit arith functions in sched.c be pulled
> out and made generic?

yep, the plan is to put this all into reciprocal_div.h and to convert
existing users of reciprocal_div to the cleaner stuff from Thomas. The
patch wont get any smaller due to that though ;-)

> - update_curr_load() is huge, inlined and has several callsites?

this is a reasonable tradeoff i think - update_curr_load()'s slowpath is
in __update_curr_load(). Anyway, it probably wont get inlined when the
kernel is built with -Os and without forced-inlining.

> - lots more macros-which-dont-need-to-be-macros in sched.c:
> load_weight(), PRIO_TO_load_weight(), RTPRIO_TO_load_weight(), maybe
> others. People are more inclined to comment functions than they are
> macros, for some reason.

these are mostly ancient macros. I fixed up some of them in my current
tree.

> - inc_load(), dec_load(), inc_nr_running(), dec_nr_running(): these will
> generate plenty of code on 32-bit and they're all inlined with
> multiple callsites.

yep - i'll revisit the inlining picture. This is not really a primary
worry i think because it's easy to tweak and people can already express
their inlining preference via CONFIG_CC_OPTIMIZE_FOR_SIZE and
CONFIG_FORCED_INLINING.

> - overall, CFS takes sched.o from 41157 of .text up to 48781 on x86_64,
> which at 18% is rather a large bloat. Hopefully a lot of this is
> the new debug stuff.

> - On i386 sched.o went from 33755 up to 43660 which is 29% growth.
> Possibly acceptable, but why did it increase a lot more than the x86_64
> version? All that 64-bit arith, I assume?

the main reason is the sched debugging stuff:

text data bss dec hex filename
37570 2538 20 40128 9cc0 kernel/sched.o
30692 2426 20 33138 8172 kernel/sched-no_sched_debug.o

i can make it depend on CONFIG_SCHEDSTATS, although i'd prefer it to be
always on.

> - style (or the lack thereof):
>
> p->se.sum_wait_runtime = p->se.sum_sleep_runtime = 0;
> p->se.sleep_start = p->se.sleep_start_fair = p->se.block_start = 0;
> p->se.sleep_max = p->se.block_max = p->se.exec_max = p->se.wait_max = 0;
> p->se.wait_runtime_overruns = p->se.wait_runtime_underruns = 0;
>
> bit of an eyesore?

fixed. (this heap grew gradually and now is/was an eyesore indeed.)

> - in sched_init() this looks funny:
>
> rq->ls.load_update_last = sched_clock();
> rq->ls.load_update_start = sched_clock();
>
> was it intended that these both get the same value?

it doesnt really matter, i fixed them to be initialized to the same
'now' value.

i've attached my current fixes. (Please dont apply it yet.)

Ingo

Not-Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

---
Makefile | 2
include/linux/sched.h | 120 +++++++++++++++++++++++++++++---------------------
kernel/exit.c | 2
kernel/sched.c | 58 ++++++++++++++++--------
kernel/sched_debug.c | 2
kernel/sched_fair.c | 61 ++++++++++++-------------
kernel/sched_rt.c | 15 +++---
7 files changed, 149 insertions(+), 111 deletions(-)

Index: linux/Makefile
===================================================================
--- linux.orig/Makefile
+++ linux/Makefile
@@ -1,7 +1,7 @@
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 22
-EXTRAVERSION = -rc6-cfs-v18
+EXTRAVERSION = -rc6-cfs-v19
NAME = Holy Dancing Manatees, Batman!

# *DOCUMENTATION*
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -528,31 +528,6 @@ struct signal_struct {
#define SIGNAL_STOP_CONTINUED 0x00000004 /* SIGCONT since WCONTINUED reap */
#define SIGNAL_GROUP_EXIT 0x00000008 /* group exit in progress */

-
-/*
- * Priority of a process goes from 0..MAX_PRIO-1, valid RT
- * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
- * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
- * values are inverted: lower p->prio value means higher priority.
- *
- * The MAX_USER_RT_PRIO value allows the actual maximum
- * RT priority to be separate from the value exported to
- * user-space. This allows kernel threads to set their
- * priority to a value higher than any user task. Note:
- * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
- */
-
-#define MAX_USER_RT_PRIO 100
-#define MAX_RT_PRIO MAX_USER_RT_PRIO
-
-#define MAX_PRIO (MAX_RT_PRIO + 40)
-#define DEFAULT_PRIO (MAX_RT_PRIO + 20)
-
-#define rt_prio(prio) unlikely((prio) < MAX_RT_PRIO)
-#define rt_task(p) rt_prio((p)->prio)
-#define is_rt_policy(p) ((p) == SCHED_FIFO || (p) == SCHED_RR)
-#define has_rt_policy(p) unlikely(is_rt_policy((p)->policy))
-
/*
* Some day this will be a full-fledged user tracking system..
*/
@@ -646,8 +621,7 @@ static inline int sched_info_on(void)
#endif
}

-enum cpu_idle_type
-{
+enum cpu_idle_type {
CPU_IDLE,
CPU_NOT_IDLE,
CPU_NEWLY_IDLE,
@@ -843,30 +817,45 @@ struct load_weight {
unsigned long weight, inv_weight;
};

-/* CFS stats for a schedulable entity (task, task-group etc) */
+/*
+ * CFS stats for a schedulable entity (task, task-group etc)
+ *
+ * Current field usage histogram:
+ *
+ * 4 se->block_start
+ * 4 se->run_node
+ * 4 se->sleep_start
+ * 4 se->sleep_start_fair
+ * 6 se->load.weight
+ * 7 se->delta_fair
+ * 15 se->wait_runtime
+ */
struct sched_entity {
- struct load_weight load; /* for nice- load-balancing purposes */
- int on_rq;
- struct rb_node run_node;
- unsigned long delta_exec;
- s64 delta_fair;
-
- u64 wait_start_fair;
- u64 wait_start;
- u64 exec_start;
- u64 sleep_start, sleep_start_fair;
- u64 block_start;
- u64 sleep_max;
- u64 block_max;
- u64 exec_max;
- u64 wait_max;
- u64 last_ran;
-
- s64 wait_runtime;
- u64 sum_exec_runtime;
- s64 fair_key;
- s64 sum_wait_runtime, sum_sleep_runtime;
- unsigned long wait_runtime_overruns, wait_runtime_underruns;
+ s64 wait_runtime;
+ s64 delta_fair;
+ struct load_weight load; /* for load-balancing */
+ struct rb_node run_node;
+ int on_rq;
+ unsigned long delta_exec;
+
+ u64 wait_start_fair;
+ u64 wait_start;
+ u64 exec_start;
+ u64 sleep_start;
+ u64 sleep_start_fair;
+ u64 block_start;
+ u64 sleep_max;
+ u64 block_max;
+ u64 exec_max;
+ u64 wait_max;
+ u64 last_ran;
+
+ u64 sum_exec_runtime;
+ s64 fair_key;
+ s64 sum_wait_runtime;
+ s64 sum_sleep_runtime;
+ unsigned long wait_runtime_overruns;
+ unsigned long wait_runtime_underruns;
};

struct task_struct {
@@ -1126,6 +1115,37 @@ struct task_struct {
#endif
};

+/*
+ * Priority of a process goes from 0..MAX_PRIO-1, valid RT
+ * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
+ * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
+ * values are inverted: lower p->prio value means higher priority.
+ *
+ * The MAX_USER_RT_PRIO value allows the actual maximum
+ * RT priority to be separate from the value exported to
+ * user-space. This allows kernel threads to set their
+ * priority to a value higher than any user task. Note:
+ * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
+ */
+
+#define MAX_USER_RT_PRIO 100
+#define MAX_RT_PRIO MAX_USER_RT_PRIO
+
+#define MAX_PRIO (MAX_RT_PRIO + 40)
+#define DEFAULT_PRIO (MAX_RT_PRIO + 20)
+
+static inline int rt_prio(int prio)
+{
+ if (unlikely(prio < MAX_RT_PRIO))
+ return 1;
+ return 0;
+}
+
+static inline int rt_task(struct task_struct *p)
+{
+ return rt_prio(p->prio);
+}
+
static inline pid_t process_group(struct task_struct *tsk)
{
return tsk->signal->pgrp;
Index: linux/kernel/exit.c
===================================================================
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -290,7 +290,7 @@ static void reparent_to_kthreadd(void)
/* Set the exit signal to SIGCHLD so we signal init on exit */
current->exit_signal = SIGCHLD;

- if (!has_rt_policy(current) && (task_nice(current) < 0))
+ if (task_nice(current) < 0)
set_user_nice(current, 0);
/* cpus_allowed? */
/* rt_priority? */
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -106,6 +106,18 @@ unsigned long long __attribute__((weak))
#define MIN_TIMESLICE max(5 * HZ / 1000, 1)
#define DEF_TIMESLICE (100 * HZ / 1000)

+static inline int rt_policy(int policy)
+{
+ if (unlikely(policy == SCHED_FIFO) || unlikely(policy == SCHED_RR))
+ return 1;
+ return 0;
+}
+
+static inline int task_has_rt_policy(struct task_struct *p)
+{
+ return rt_policy(p->policy);
+}
+
/*
* This is the priority-queue data structure of the RT scheduling class:
*/
@@ -752,7 +764,7 @@ static void set_load_weight(struct task_
task_rq(p)->cfs.wait_runtime -= p->se.wait_runtime;
p->se.wait_runtime = 0;

- if (has_rt_policy(p)) {
+ if (task_has_rt_policy(p)) {
p->se.load.weight = prio_to_weight[0] * 2;
p->se.load.inv_weight = prio_to_wmult[0] >> 1;
return;
@@ -805,7 +817,7 @@ static inline int normal_prio(struct tas
{
int prio;

- if (has_rt_policy(p))
+ if (task_has_rt_policy(p))
prio = MAX_RT_PRIO-1 - p->rt_priority;
else
prio = __normal_prio(p);
@@ -1476,17 +1488,24 @@ int fastcall wake_up_state(struct task_s
*/
static void __sched_fork(struct task_struct *p)
{
- p->se.wait_start_fair = p->se.wait_start = p->se.exec_start = 0;
- p->se.sum_exec_runtime = 0;
- p->se.delta_exec = 0;
- p->se.delta_fair = 0;
-
- p->se.wait_runtime = 0;
-
- p->se.sum_wait_runtime = p->se.sum_sleep_runtime = 0;
- p->se.sleep_start = p->se.sleep_start_fair = p->se.block_start = 0;
- p->se.sleep_max = p->se.block_max = p->se.exec_max = p->se.wait_max = 0;
- p->se.wait_runtime_overruns = p->se.wait_runtime_underruns = 0;
+ p->se.wait_start_fair = 0;
+ p->se.wait_start = 0;
+ p->se.exec_start = 0;
+ p->se.sum_exec_runtime = 0;
+ p->se.delta_exec = 0;
+ p->se.delta_fair = 0;
+ p->se.wait_runtime = 0;
+ p->se.sum_wait_runtime = 0;
+ p->se.sum_sleep_runtime = 0;
+ p->se.sleep_start = 0;
+ p->se.sleep_start_fair = 0;
+ p->se.block_start = 0;
+ p->se.sleep_max = 0;
+ p->se.block_max = 0;
+ p->se.exec_max = 0;
+ p->se.wait_max = 0;
+ p->se.wait_runtime_overruns = 0;
+ p->se.wait_runtime_underruns = 0;

INIT_LIST_HEAD(&p->run_list);
p->se.on_rq = 0;
@@ -1799,7 +1818,7 @@ static void update_cpu_load(struct rq *t
int i, scale;

this_rq->nr_load_updates++;
- if (sysctl_sched_features & 64)
+ if (unlikely(!(sysctl_sched_features & SCHED_FEAT_PRECISE_CPU_LOAD)))
goto do_avg;

/* Update delta_fair/delta_exec fields first */
@@ -3801,7 +3820,7 @@ void set_user_nice(struct task_struct *p
* it wont have any effect on scheduling until the task is
* SCHED_FIFO/SCHED_RR:
*/
- if (has_rt_policy(p)) {
+ if (task_has_rt_policy(p)) {
p->static_prio = NICE_TO_PRIO(nice);
goto out_unlock;
}
@@ -3999,14 +4018,14 @@ recheck:
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
- if (is_rt_policy(policy) != (param->sched_priority != 0))
+ if (rt_policy(policy) != (param->sched_priority != 0))
return -EINVAL;

/*
* Allow unprivileged RT tasks to decrease priority:
*/
if (!capable(CAP_SYS_NICE)) {
- if (is_rt_policy(policy)) {
+ if (rt_policy(policy)) {
unsigned long rlim_rtprio;
unsigned long flags;

@@ -6186,6 +6205,7 @@ int in_sched_functions(unsigned long add

void __init sched_init(void)
{
+ u64 now = sched_clock();
int highest_cpu = 0;
int i, j;

@@ -6206,8 +6226,8 @@ void __init sched_init(void)
rq->nr_running = 0;
rq->cfs.tasks_timeline = RB_ROOT;
rq->clock = rq->cfs.fair_clock = 1;
- rq->ls.load_update_last = sched_clock();
- rq->ls.load_update_start = sched_clock();
+ rq->ls.load_update_last = now;
+ rq->ls.load_update_start = now;

for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
rq->cpu_load[j] = 0;
Index: linux/kernel/sched_debug.c
===================================================================
--- linux.orig/kernel/sched_debug.c
+++ linux/kernel/sched_debug.c
@@ -157,7 +157,7 @@ static int sched_debug_show(struct seq_f
u64 now = ktime_to_ns(ktime_get());
int cpu;

- SEQ_printf(m, "Sched Debug Version: v0.03, cfs-v18, %s %.*s\n",
+ SEQ_printf(m, "Sched Debug Version: v0.03, cfs-v19, %s %.*s\n",
init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version);
Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -61,8 +61,24 @@ unsigned int sysctl_sched_stat_granulari
*/
unsigned int sysctl_sched_runtime_limit __read_mostly;

+/*
+ * Debugging: various feature bits
+ */
+enum {
+ SCHED_FEAT_IGNORE_PREEMPTED = 1,
+ SCHED_FEAT_DISTRIBUTE = 2,
+ SCHED_FEAT_FAIR_SLEEPERS = 4,
+ SCHED_FEAT_SLEEPER_AVG = 32,
+ SCHED_FEAT_PRECISE_CPU_LOAD = 64,
+ SCHED_FEAT_START_DEBIT = 128,
+ SCHED_FEAT_SKIP_INITIAL = 256,
+};
+
unsigned int sysctl_sched_features __read_mostly =
- 0 | 2 | 4 | 8 | 0 | 0 | 0 | 0;
+ SCHED_FEAT_DISTRIBUTE |
+ SCHED_FEAT_FAIR_SLEEPERS |
+ SCHED_FEAT_SLEEPER_AVG |
+ SCHED_FEAT_PRECISE_CPU_LOAD;

extern struct sched_class fair_sched_class;

@@ -145,7 +161,7 @@ static inline void
__dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
if (cfs_rq->rb_leftmost == &se->run_node)
- cfs_rq->rb_leftmost = NULL;
+ cfs_rq->rb_leftmost = rb_next(&se->run_node);
rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
update_load_sub(&cfs_rq->load, se->load.weight);
cfs_rq->nr_running--;
@@ -258,7 +274,7 @@ __update_curr(struct cfs_rq *cfs_rq, str
* Task already marked for preemption, do not burden
* it with the cost of not having left the CPU yet:
*/
- if (unlikely(sysctl_sched_features & 1))
+ if (unlikely(sysctl_sched_features & SCHED_FEAT_IGNORE_PREEMPTED))
if (unlikely(test_tsk_thread_flag(curtask, TIF_NEED_RESCHED)))
return;

@@ -305,7 +321,7 @@ update_stats_wait_start(struct cfs_rq *c
se->wait_start = now;
}

-static inline s64 weight_s64(s64 calc, unsigned long weight, int shift)
+static s64 weight_s64(s64 calc, unsigned long weight, int shift)
{
if (calc < 0) {
calc = - calc * weight;
@@ -317,7 +333,7 @@ static inline s64 weight_s64(s64 calc, u
/*
* Task is being enqueued - update stats:
*/
-static inline void
+static void
update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
{
s64 key;
@@ -438,7 +454,7 @@ static void distribute_fair_add(struct c
struct sched_entity *curr = cfs_rq_curr(cfs_rq);
s64 delta_fair = 0;

- if (!(sysctl_sched_features & 2))
+ if (!(sysctl_sched_features & SCHED_FEAT_DISTRIBUTE))
return;

if (cfs_rq->nr_running) {
@@ -469,7 +485,7 @@ __enqueue_sleeper(struct cfs_rq *cfs_rq,
* Fix up delta_fair with the effect of us running
* during the whole sleep period:
*/
- if (!(sysctl_sched_features & 32))
+ if (sysctl_sched_features & SCHED_FEAT_SLEEPER_AVG)
delta_fair = div64_s(delta_fair * load, load + se->load.weight);

delta_fair = weight_s64(delta_fair, se->load.weight, NICE_0_SHIFT);
@@ -495,7 +511,7 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, s
s64 delta_fair;

if ((entity_is_task(se) && tsk->policy == SCHED_BATCH) ||
- !(sysctl_sched_features & 4))
+ !(sysctl_sched_features & SCHED_FEAT_FAIR_SLEEPERS))
return;

delta_fair = cfs_rq->fair_clock - se->sleep_start_fair;
@@ -574,7 +590,7 @@ static void dequeue_entity(struct cfs_rq
/*
* Preempt the current task with a newly woken task if needed:
*/
-static inline void
+static void
__check_preempt_curr_fair(struct cfs_rq *cfs_rq, struct sched_entity *se,
struct sched_entity *curr, unsigned long granularity)
{
@@ -612,23 +628,6 @@ put_prev_entity(struct cfs_rq *cfs_rq, s
int updated = 0;

/*
- * If the task is still waiting for the CPU (it just got
- * preempted), update its position within the tree and
- * start the wait period:
- */
- if ((sysctl_sched_features & 16) && entity_is_task(prev)) {
- struct task_struct *prevtask = task_of(prev);
-
- if (prev->on_rq &&
- test_tsk_thread_flag(prevtask, TIF_NEED_RESCHED)) {
-
- dequeue_entity(cfs_rq, prev, 0, now);
- enqueue_entity(cfs_rq, prev, 0, now);
- updated = 1;
- }
- }
-
- /*
* If still on the runqueue then deactivate_task()
* was not called and update_curr() has to be done:
*/
@@ -741,10 +740,8 @@ static void check_preempt_curr_fair(stru
unsigned long gran;

if (unlikely(rt_prio(p->prio))) {
- if (sysctl_sched_features & 8) {
- if (rt_prio(p->prio))
- update_curr(cfs_rq, rq_clock(rq));
- }
+ if (rt_prio(p->prio))
+ update_curr(cfs_rq, rq_clock(rq));
resched_task(curr);
return;
}
@@ -850,14 +847,14 @@ static void task_new_fair(struct rq *rq,
* The first wait is dominated by the child-runs-first logic,
* so do not credit it with that waiting time yet:
*/
- if (sysctl_sched_features & 256)
+ if (sysctl_sched_features & SCHED_FEAT_SKIP_INITIAL)
p->se.wait_start_fair = 0;

/*
* The statistical average of wait_runtime is about
* -granularity/2, so initialize the task with that:
*/
- if (sysctl_sched_features & 128)
+ if (sysctl_sched_features & SCHED_FEAT_START_DEBIT)
p->se.wait_runtime = -(s64)(sysctl_sched_granularity / 2);

__enqueue_entity(cfs_rq, se);
Index: linux/kernel/sched_rt.c
===================================================================
--- linux.orig/kernel/sched_rt.c
+++ linux/kernel/sched_rt.c
@@ -12,7 +12,7 @@ static inline void update_curr_rt(struct
struct task_struct *curr = rq->curr;
u64 delta_exec;

- if (!has_rt_policy(curr))
+ if (!task_has_rt_policy(curr))
return;

delta_exec = now - curr->se.exec_start;
@@ -179,13 +179,14 @@ static void task_tick_rt(struct rq *rq,
if (p->policy != SCHED_RR)
return;

- if (!(--p->time_slice)) {
- p->time_slice = static_prio_timeslice(p->static_prio);
- set_tsk_need_resched(p);
+ if (--p->time_slice)
+ return;

- /* put it at the end of the queue: */
- requeue_task_rt(rq, p);
- }
+ p->time_slice = static_prio_timeslice(p->static_prio);
+ set_tsk_need_resched(p);
+
+ /* put it at the end of the queue: */
+ requeue_task_rt(rq, p);
}

/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/