[tip:x86/asm] x86/kvm: On KVM re-enable (e.g. after suspend), update clocks

From: tip-bot for Andy Lutomirski
Date: Mon Dec 14 2015 - 03:17:34 EST


Commit-ID: 677a73a9aa5433ea728200c26a7b3506d5eaa92b
Gitweb: http://git.kernel.org/tip/677a73a9aa5433ea728200c26a7b3506d5eaa92b
Author: Andy Lutomirski <luto@xxxxxxxxxx>
AuthorDate: Thu, 10 Dec 2015 19:20:18 -0800
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Fri, 11 Dec 2015 08:56:02 +0100

x86/kvm: On KVM re-enable (e.g. after suspend), update clocks

This gets rid of the "did TSC go backwards" logic and just
updates all clocks. It should work better (no more disabling of
fast timing) and more reliably (all of the clocks are actually
updated).

Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Brian Gerst <brgerst@xxxxxxxxx>
Cc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
Cc: H. Peter Anvin <hpa@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: linux-mm@xxxxxxxxx
Link: http://lkml.kernel.org/r/861716d768a1da6d1fd257b7972f8df13baf7f85.1449702533.git.luto@xxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/kvm/x86.c | 75 +++---------------------------------------------------
1 file changed, 3 insertions(+), 72 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00462bd..6e32e87 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -123,8 +123,6 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
unsigned int __read_mostly lapic_timer_advance_ns = 0;
module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR);

-static bool __read_mostly backwards_tsc_observed = false;
-
#define KVM_NR_SHARED_MSRS 16

struct kvm_shared_msrs_global {
@@ -1671,7 +1669,6 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
&ka->master_cycle_now);

ka->use_master_clock = host_tsc_clocksource && vcpus_matched
- && !backwards_tsc_observed
&& !ka->boot_vcpu_runs_old_kvmclock;

if (ka->use_master_clock)
@@ -7366,88 +7363,22 @@ int kvm_arch_hardware_enable(void)
struct kvm_vcpu *vcpu;
int i;
int ret;
- u64 local_tsc;
- u64 max_tsc = 0;
- bool stable, backwards_tsc = false;

kvm_shared_msr_cpu_online();
ret = kvm_x86_ops->hardware_enable();
if (ret != 0)
return ret;

- local_tsc = rdtsc();
- stable = !check_tsc_unstable();
list_for_each_entry(kvm, &vm_list, vm_list) {
kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!stable && vcpu->cpu == smp_processor_id())
+ if (vcpu->cpu == smp_processor_id()) {
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
- if (stable && vcpu->arch.last_host_tsc > local_tsc) {
- backwards_tsc = true;
- if (vcpu->arch.last_host_tsc > max_tsc)
- max_tsc = vcpu->arch.last_host_tsc;
+ kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE,
+ vcpu);
}
}
}

- /*
- * Sometimes, even reliable TSCs go backwards. This happens on
- * platforms that reset TSC during suspend or hibernate actions, but
- * maintain synchronization. We must compensate. Fortunately, we can
- * detect that condition here, which happens early in CPU bringup,
- * before any KVM threads can be running. Unfortunately, we can't
- * bring the TSCs fully up to date with real time, as we aren't yet far
- * enough into CPU bringup that we know how much real time has actually
- * elapsed; our helper function, get_kernel_ns() will be using boot
- * variables that haven't been updated yet.
- *
- * So we simply find the maximum observed TSC above, then record the
- * adjustment to TSC in each VCPU. When the VCPU later gets loaded,
- * the adjustment will be applied. Note that we accumulate
- * adjustments, in case multiple suspend cycles happen before some VCPU
- * gets a chance to run again. In the event that no KVM threads get a
- * chance to run, we will miss the entire elapsed period, as we'll have
- * reset last_host_tsc, so VCPUs will not have the TSC adjusted and may
- * loose cycle time. This isn't too big a deal, since the loss will be
- * uniform across all VCPUs (not to mention the scenario is extremely
- * unlikely). It is possible that a second hibernate recovery happens
- * much faster than a first, causing the observed TSC here to be
- * smaller; this would require additional padding adjustment, which is
- * why we set last_host_tsc to the local tsc observed here.
- *
- * N.B. - this code below runs only on platforms with reliable TSC,
- * as that is the only way backwards_tsc is set above. Also note
- * that this runs for ALL vcpus, which is not a bug; all VCPUs should
- * have the same delta_cyc adjustment applied if backwards_tsc
- * is detected. Note further, this adjustment is only done once,
- * as we reset last_host_tsc on all VCPUs to stop this from being
- * called multiple times (one for each physical CPU bringup).
- *
- * Platforms with unreliable TSCs don't have to deal with this, they
- * will be compensated by the logic in vcpu_load, which sets the TSC to
- * catchup mode. This will catchup all VCPUs to real time, but cannot
- * guarantee that they stay in perfect synchronization.
- */
- if (backwards_tsc) {
- u64 delta_cyc = max_tsc - local_tsc;
- backwards_tsc_observed = true;
- list_for_each_entry(kvm, &vm_list, vm_list) {
- kvm_for_each_vcpu(i, vcpu, kvm) {
- vcpu->arch.tsc_offset_adjustment += delta_cyc;
- vcpu->arch.last_host_tsc = local_tsc;
- kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
- }
-
- /*
- * We have to disable TSC offset matching.. if you were
- * booting a VM while issuing an S4 host suspend....
- * you may have some problem. Solving this issue is
- * left as an exercise to the reader.
- */
- kvm->arch.last_tsc_nsec = 0;
- kvm->arch.last_tsc_write = 0;
- }
-
- }
return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/