[PATCH 16/16] KVM-GST: adjust scheduler cpu power

From: Glauber Costa
Date: Mon Jan 24 2011 - 13:08:22 EST


This is a first proposal for using steal time information
to influence the scheduler. There are a lot of optimizations
and fine grained adjustments to be done, but it is working reasonably
so far for me (mostly)

With this patch (and some host pinnings to demonstrate the situation),
two vcpus with very different steal time (Say 80 % vs 1 %) will not get
an even distribution of processes. This is a situation that can naturally
arise, specially in overcommited scenarios. Previosly, the guest scheduler
would wrongly think that all cpus have the same ability to run processes,
lowering the overall throughput.

Signed-off-by: Glauber Costa <glommer@xxxxxxxxxx>
CC: Rik van Riel <riel@xxxxxxxxxx>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CC: Avi Kivity <avi@xxxxxxxxxx>
---
include/linux/sched.h | 1 +
kernel/sched.c | 4 ++++
kernel/sched_fair.c | 10 ++++++++++
3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d747f94..beab72d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -302,6 +302,7 @@ long io_schedule_timeout(long timeout);
extern void cpu_init (void);
extern void trap_init(void);
extern void update_process_times(int user);
+extern cputime_t (*hypervisor_steal_time)(void);
extern void scheduler_tick(void);

extern void sched_show_task(struct task_struct *p);
diff --git a/kernel/sched.c b/kernel/sched.c
index 3b3e88d..c460f0d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -501,6 +501,8 @@ struct rq {
struct sched_domain *sd;

unsigned long cpu_power;
+ unsigned long real_ticks;
+ unsigned long total_ticks;

unsigned char idle_at_tick;
/* For active balancing */
@@ -3533,10 +3535,12 @@ static int touch_steal_time(int is_idle)
if (is_idle)
return 0;

+ rq->total_ticks++;
if (steal) {
account_steal_time(steal);
return 1;
}
+ rq->real_ticks++;
return 0;
}

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index c62ebae..1364c28 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2509,6 +2509,16 @@ static void update_cpu_power(struct sched_domain *sd, int cpu)
power >>= SCHED_LOAD_SHIFT;
}

+ WARN_ON(cpu_rq(cpu)->real_ticks > cpu_rq(cpu)->total_ticks);
+
+ if (cpu_rq(cpu)->total_ticks) {
+ power *= cpu_rq(cpu)->real_ticks;
+ power /= cpu_rq(cpu)->total_ticks;
+ }
+
+ cpu_rq(cpu)->total_ticks = 0;
+ cpu_rq(cpu)->real_ticks = 0;
+
sdg->cpu_power_orig = power;

if (sched_feat(ARCH_POWER))
--
1.7.2.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/