[PATCH] sched: make sure sched_child_runs_first is not fairy tale
From: marywangran
Date: Tue Apr 28 2009 - 03:49:28 EST
CFS scheduler become the main scheduler after 2.6.23.everything is
fair,no starvation,no complexity.The new task would not simply be
queued at the head to quickly preempt current.according to the code
of kernel 2.6.28,if you clear the STAR_DEBIT bit by sysctl -w
kernel.sched_features=orig_value&~STSRT_DEBIT_bit,child task would not
preempt its father always,and this problem is easier to recur if you
use a father task with lower nice value. my test file is:
/*******child_first.c**********/
#include <sched.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc,char *argv[])
{
cpu_set_t mask;
__CPU_ZERO( &mask );
__CPU_SET(0, &mask );
sched_setaffinity( 0, sizeof(mask), &mask );
int v = atoi(argv[1]);
nice(v);
int i = 90000;
while(i-->0)
{
v++;
}
if(fork() == 0)
{
printf("sub\n");
exit(0);
}
printf("main,%d\n",v);
}
just compile it to child_first and do following:
[root@zhaoya ~]#sysctl -w kernel.sched_features=0
[root@zhaoya ~]#./child_first -20
[root@zhaoya ~]#./child_first -xx
...
[root@zhaoya ~]#./child_first 10000...
after all this,believe your eyes.
because the code judgeing the condition whether the child should
preempt the father is very LOOSE!if the nice value of father is very
low and the nr_running is very small,the cfs_rq->min_vruntime is
always equal with the vruntime of father,so {curr->vruntime <=
se->vruntime}.if the nice value if high,the cfs_rq->min_vruntime is
always little than father so {cfs_rq->min_vruntime <= curr->vruntime}
Signed-off-by: Ya Zhao <marywangran@xxxxxxxxx>
---
--- linux-2.6.28.1/kernel/sched_fair.c.orig 2009-04-28 22:26:00.000000000 +0800
+++ linux-2.6.28.1/kernel/sched_fair.c 2009-04-28 22:34:49.000000000 +0800
@@ -1628,12 +1628,13 @@ static void task_new_fair(struct rq *rq,
/* 'curr' will be NULL if the child belongs to a different group */
if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) &&
- curr && curr->vruntime < se->vruntime) {
+ curr && (curr->vruntime <= se->vruntime||cfs_rq->min_vruntime <=
curr->vruntime)) {
/*
* Upon rescheduling, sched_class::put_prev_task() will place
* 'current' within the tree based on its new key value.
*/
- swap(curr->vruntime, se->vruntime);
+ if( curr->vruntime < se->vruntime )
+ swap(curr->vruntime, se->vruntime);
resched_task(rq->curr);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/