Re: kvmclock - the host part.

From: Glauber de Oliveira Costa
Date: Wed Nov 07 2007 - 08:09:56 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Avi Kivity escreveu:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it periodically.
>>
>> Signed-off-by: Glauber de Oliveira Costa <gcosta@xxxxxxxxxx>
>> ---
>> drivers/kvm/irq.c | 1 +
>> drivers/kvm/kvm_main.c | 2 +
>> drivers/kvm/svm.c | 1 +
>> drivers/kvm/vmx.c | 1 +
>> drivers/kvm/x86.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
>> drivers/kvm/x86.h | 13 ++++++++++
>> 6 files changed, 77 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
>> index 22bfeee..0344879 100644
>> --- a/drivers/kvm/irq.c
>> +++ b/drivers/kvm/irq.c
>> @@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)
>>
>> void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
>> {
>> + vcpu->time_needs_update = 1;
>>
>
> Why here and not in __vcpu_run()? It isn't timer irq related.
Because my plan was exactly, updating it at each timer interrupt.
There's a trade off between
updating every run (hopefully more precision, but more overhead), versus
updating at timer irqs, or other events.

What would you prefer?

>> @@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
>> case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
>> case KVM_CAP_USER_MEMORY:
>> case KVM_CAP_SET_TSS_ADDR:
>> + case KVM_CAP_CLK:
>>
>
> It's just a clock source now, right? so _CLOCK_SOURCE.
Right.

>> +static void kvm_write_guest_time(struct kvm_vcpu *vcpu)
>> +{
>> + struct timespec ts;
>> + void *clock_addr;
>> +
>> +
>> + if (!vcpu->clock_page)
>> + return;
>> +
>> + /* Updates version to the next odd number, indicating we're writing */
>> + vcpu->hv_clock.version++;
>>
>
> No one can actually see this as you're updating a private structure.
> You need to copy it to guestspace.
That's true, I'm just copying it at the end, the whole thing. thanks.

>> + /* Updating the tsc count is the first thing we do */
>> + kvm_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &vcpu->hv_clock.last_tsc);
>> + ktime_get_ts(&ts);
>> + vcpu->hv_clock.now_ns = ts.tv_nsec + (NSEC_PER_SEC * (u64)ts.tv_sec);
>> + vcpu->hv_clock.wc_sec = get_seconds();
>> + vcpu->hv_clock.version++;
>> +
>> + clock_addr = vcpu->clock_addr;
>> + memcpy(clock_addr, &vcpu->hv_clock, sizeof(vcpu->hv_clock));
>> + mark_page_dirty(vcpu->kvm, vcpu->clock_gfn);
>>
>
> Just use kvm_write_guest().
Too slow. Updating guest time, even only in timer interrupts, was a too
frequent operation, and the kmap / kunmap (atomic) at every iteration
deemed the whole thing
unusable.

>> +
>> + vcpu->time_needs_update = 0;
>> +}
>> +
>> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> {
>> unsigned long nr, a0, a1, a2, a3, ret;
>> @@ -1648,7 +1674,33 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> a3 &= 0xFFFFFFFF;
>> }
>>
>> + ret = 0;
>> switch (nr) {
>> + case KVM_HCALL_REGISTER_CLOCK: {
>> + struct kvm_vcpu *dst_vcpu;
>> +
>> + if (!((a1 < KVM_MAX_VCPUS) && (vcpu->kvm->vcpus[a1]))) {
>> + ret = -KVM_EINVAL;
>> + break;
>> + }
>> +
>> + dst_vcpu = vcpu->kvm->vcpus[a1];
>>
>
> What if !dst_vcpu? What about locking?
>
> Suggest simply using vcpu. Every guest cpu can register its own
Earlier version had a check for !dst_vcpu, you are absolutely right.

Locking was not a problem in practice, because these operations are done
serialized, by the same cpu.

This hypercall is called by cpu_up, which, at least in the beginning,
it's called by cpu0. And that's why each vcpu cannot register its own.
(And why we don't need locking).

Well, theorectically each vcpu do can register its own clocksource, it
will just be a little bit more complicated, we have to fire out an IPI,
and have the other cpu to catch it, and call the hypercall.

But I honestly don't like it.
Usually, the cpu leaves start_secondary with a clock already registered,
so the kernel relies on it.

>
>> + dst_vcpu->clock_page = gfn_to_page(vcpu->kvm, a0 >> PAGE_SHIFT);
>>
>
> Shift right? Why?
a0 is not a gfn, but a physical address.

>> +
>> + if (!dst_vcpu->clock_page) {
>>
>
> IIRC gfn_to_page() never returns NULL, need a different check.
You are right. I developed part of it in an older version of kvm, where
reality was:

struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
struct kvm_memory_slot *slot;

gfn = unalias_gfn(kvm, gfn);
slot = __gfn_to_memslot(kvm, gfn);
if (!slot)
return NULL;
return slot->phys_mem[gfn - slot->base_gfn];
}

Will update.

>> + ret = -KVM_EINVAL;
>> + break;
>> + }
>> + dst_vcpu->clock_gfn = a0 >> PAGE_SHIFT;
>> +
>> + dst_vcpu->hv_clock.tsc_mult = clocksource_khz2mult(tsc_khz, 22);
>> + dst_vcpu->clock_addr = kmap(dst_vcpu->clock_page);
>>
>
> kmap() is bad since the page can move due to swapping.
> kvm_write_guest() is your friend.
Yeah , right, but again: It will be slow to the point of making the
whole thing not worthy. So what alternatives do we get?

>> +static inline void release_clock(struct kvm_vcpu *vcpu)
>> +{
>> + if (vcpu->clock_page)
>> + kunmap(vcpu->clock_page);
>> +}
>>
>
>
> While it's a static inline, please prefix with kvm_ in case one day it
> isn't.
>

Okay, will do.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbjDjYI8LaFUWXMRAvcnAKCZOtPqHAxvcUkAfSaOezPWq1ib2wCg1TNz
fT1rt86/j25K/6lmFsRmbI0=
=nSkW
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/