Re: [PATCH][RFC] kvm-scheduler integration

From: Avi Kivity
Date: Sun Jul 08 2007 - 09:16:24 EST


Ingo Molnar wrote:
* Avi Kivity <avi@xxxxxxxxxxxx> wrote:

Intel VT essentially introduces a new set of registers into the processor; this means we cannot preempt kvm in kernel mode lest a new VM run with and old VM's registers. In addition, kvm lazy switches some host registers as well. (AMD does not introduce new registers, but we still want lazy msr switching, and we want to know when we move to a different cpu in order to be able to guarantee a monotonously increasing tsc).

Current kvm code simply disables preemption when guest context is in use. This, however, has many drawbacks:

- some kvm mmu code is O(n), causing possibly unbounded latencies and causing
-rt great unhappiness.
- the mmu code wants to sleep (especially with guest paging), but can't.
- some optimizations are not possible; for example, if we switch from one
VM to another, we need not restore some host registers (as they will simply
be overwritten with the new guest registers immediately).

This patch adds hooks to the scheduler that allow kvm to be notified about scheduling decisions involving virtual machines. When we schedule out a VM, kvm is told to swap guest registers out; when we schedule the VM in, we swap the registers back in.

hm, why not do what i have in -rt? See the patch below. Seems to work fine for me, although i might be missing something.


How can this work with >1 VM? We need to execute vmptrld with the new VM's vmcs before touching any VT registers.

Recent kvm also keeps the guest syscall msrs on the host, so if you switch context to a process that then issues a syscall, you enter the host kernel at the guest syscall entry point... makes for nice stacktraces.

Ingo

------------------------->
Subject: [patch] kvm: make vcpu_load/put preemptible
From: Ingo Molnar <mingo@xxxxxxx>

make vcpu_load/put preemptible.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
drivers/kvm/svm.c | 13 ++++++++++---
drivers/kvm/vmx.c | 15 ++++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)

Index: linux-rt-rebase.q/drivers/kvm/svm.c
===================================================================
--- linux-rt-rebase.q.orig/drivers/kvm/svm.c
+++ linux-rt-rebase.q/drivers/kvm/svm.c
@@ -610,9 +610,17 @@ static void svm_free_vcpu(struct kvm_vcp
static void svm_vcpu_load(struct kvm_vcpu *vcpu)
{
- int cpu, i;
+ int cpu = raw_smp_processor_id(), i;
+ cpumask_t this_mask = cpumask_of_cpu(cpu);
+
+ /*
+ * Keep the context preemptible, but do not migrate
+ * away to another CPU. TODO: make sure this persists.
+ * Save/restore original mask.
+ */
+ if (unlikely(!cpus_equal(current->cpus_allowed, this_mask)))
+ set_cpus_allowed(current, cpumask_of_cpu(cpu));
- cpu = get_cpu();
if (unlikely(cpu != vcpu->cpu)) {
u64 tsc_this, delta;
@@ -638,7 +646,6 @@ static void svm_vcpu_put(struct kvm_vcpu
wrmsrl(host_save_user_msrs[i], vcpu->svm->host_user_msrs[i]);
rdtscll(vcpu->host_tsc);
- put_cpu();
}
static void svm_vcpu_decache(struct kvm_vcpu *vcpu)
Index: linux-rt-rebase.q/drivers/kvm/vmx.c
===================================================================
--- linux-rt-rebase.q.orig/drivers/kvm/vmx.c
+++ linux-rt-rebase.q/drivers/kvm/vmx.c
@@ -241,9 +241,16 @@ static void vmcs_set_bits(unsigned long static void vmx_vcpu_load(struct kvm_vcpu *vcpu)
{
u64 phys_addr = __pa(vcpu->vmcs);
- int cpu;
+ int cpu = raw_smp_processor_id();
+ cpumask_t this_mask = cpumask_of_cpu(cpu);
- cpu = get_cpu();
+ /*
+ * Keep the context preemptible, but do not migrate
+ * away to another CPU. TODO: make sure this persists.
+ * Save/restore original mask.
+ */
+ if (unlikely(!cpus_equal(current->cpus_allowed, this_mask)))
+ set_cpus_allowed(current, cpumask_of_cpu(cpu));
if (vcpu->cpu != cpu)
vcpu_clear(vcpu);
@@ -281,7 +288,6 @@ static void vmx_vcpu_load(struct kvm_vcp
static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
{
kvm_put_guest_fpu(vcpu);
- put_cpu();
}
static void vmx_vcpu_decache(struct kvm_vcpu *vcpu)
@@ -1862,6 +1868,7 @@ again:
}
#endif
+ preempt_disable();
asm (
/* Store host registers */
"pushf \n\t"
@@ -2002,6 +2009,8 @@ again:
reload_tss();
}
+ preempt_enable();
+
++vcpu->stat.exits;
#ifdef CONFIG_X86_64


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/