Re: [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks

From: Andy Lutomirski
Date: Wed Mar 16 2016 - 18:16:16 EST


On Wed, Mar 16, 2016 at 3:06 PM, Radim Krcmar <rkrcmar@xxxxxxxxxx> wrote:
> 2015-12-09 15:12-0800, Andy Lutomirski:
>> This gets rid of the "did TSC go backwards" logic and just updates
>> all clocks. It should work better (no more disabling of fast
>> timing) and more reliably (all of the clocks are actually updated).
>>
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
>> ---
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -7369,88 +7366,22 @@ int kvm_arch_hardware_enable(void)
>> list_for_each_entry(kvm, &vm_list, vm_list) {
>> kvm_for_each_vcpu(i, vcpu, kvm) {
>> + if (vcpu->cpu == smp_processor_id()) {
>
> (vmm_exclusive sets vcpu->cpu to -1, so KVM_REQ_MASTERCLOCK_UPDATE might
> not run, but vmm_exclusive probably doesn't work anyway.)
>
>> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>> + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE,
>> + vcpu);
>> }
>
> (Requesting KVM_REQ_MASTERCLOCK_UPDATE once per VM is enough.)
>
>> - if (backwards_tsc) {
>> - u64 delta_cyc = max_tsc - local_tsc;
>> - backwards_tsc_observed = true;
>> - list_for_each_entry(kvm, &vm_list, vm_list) {
>> - kvm_for_each_vcpu(i, vcpu, kvm) {
>> - vcpu->arch.tsc_offset_adjustment += delta_cyc;
>> - vcpu->arch.last_host_tsc = local_tsc;
>
> tsc_offset_adjustment was set for
>
> /* Apply any externally detected TSC adjustments (due to suspend) */
> if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
> adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
> vcpu->arch.tsc_offset_adjustment = 0;
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> }
>
> Guest TSC is going to jump backward with this patch, which would make
> the guest think that a lot of cycles passed. This has no bearing on
> guest timekeeping, because the guest shouldn't be using raw TSC.
> If we wanted to do something though, there are at least two options:
> 1) Fake that TSC continued at roughly its specified rate: compute how
> many cycles could have elapsed while the CPU was suspended (using
> host time before/after suspend and guest TSC frequency) and adjust
> guest TSC.
> 2) Resume guest TSC at its last cycle before suspend.
> (Roughly what KVM does now.)
>
> What are your opinions on TSC faking?

I'd suggest restarting it wherever it left off, because it's simpler.
If there was a CLOCK_BOOT_RAW, you could try to track it, but I'm not
sure that such a thing exists.

FWIW, if you ever intend to support ART ("always running timer")
passthrough, this is going to be a giant clusterfsck. Good luck. I
haven't gotten a straight answer as to what hardware actually supports
that thing, so even testing isn't no easy.

>
> Thanks.
>
>
> ---
> Btw. I'll be spending some days to decipher kvmclock, so I'd also fix
> the masterclock+suspend issue, if you don't mind ... So far, I don't
> even see a reason to update kvmclock on kvm_arch_hardware_enable().
> Suspend is a condition that we want to handle, so kvm_resume would be a
> better place, but we handle suspend only because TSC and timekeeping has
> changed, so I think that the right place is in their event notifiers.

I'd be glad to try to review things. Please cc me.

One of the Xen people pointed me at the MS Viridian spec for handling
TSC rate changes on migration to or from hosts that don't support TSC
scaling. I wonder if KVM could use the same technique or even the
same API.

--Andy