[RFC PATCH 0/10] Cleaning up the KVM clock mess

From: David Woodhouse
Date: Thu Apr 18 2024 - 15:41:28 EST


I lied, there aren't three different definitions of the KVM clock. The
fourth is on i386, where it's still based on CLOCK_MONOTONIC (well,
boot time which might as well be the same sine we offset it anyway).

If we fix *that* to be based on CLOCK_MONOTONIC_RAW then we can rip out
a whole bunch of mechanisms which were added to cope with NTP frequency
skew.

This cleans up the mess and gets us back down to the two unavoidable
definitions of the KVM clock: when the host and guest TSCs are well
behaved and in sync, it's in "master clock" mode where it's defined as
a simple arithmetic function of the guest TSC, otherwise it's clamped
to the host's CLOCK_MONOTONIC_RAW.

It includes Jack's KVM_[GS]ET_CLOCK_GUEST patch to allow accurate
migration. Also my KVM_VCPU_TSC_SCALE which exposes the precise TSC
scaling factors. This is needed to get accurate migration of the guest
TSC, and can *also* be used by userspace to have vDSO-style access to
the KVM clock. Thus allowing hypercalls and other emulated clock devices
(e.g. PIT, HPET, ACPI timer) to be based on the KVM clock too, giving
*consistency* across a live migration.

I do still need to fix KVM_REQ_MASTERCLOCK_UPDATE so that it doesn't
clamp the clock back to CLOCK_MONOTONIC_RAW; we should *update* the
ka->kvmclock_offset when we've been running in use_master_clock mode.
Should probably do that in kvm_arch_hardware_disable() for timekeeping
across hibernation too, but I haven't finished working that one out.

I think there are still some places left where KVM reads the time twice
in close(ish) succession and then assumes they were at the *same* time,
which I'll audit and fix too.

I also need to flesh out the test cases and do some real testing from
VMMs, but I think it's ready for some heckling at least.

https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/clocks

David Woodhouse (8):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Remove periodic global clock updates
KVM: x86: Kill KVM_REQ_GLOBAL_CLOCK_UPDATE
KVM: x86: Fix KVM clock precision in __get_kvmclock()

Jack Allister (2):
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
KVM: selftests: Add KVM/PV clock selftest to prove timer correction

Documentation/virt/kvm/api.rst | 37 ++
Documentation/virt/kvm/devices/vcpu.rst | 115 ++++--
arch/x86/include/asm/kvm_host.h | 8 +-
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 438 +++++++++++++---------
arch/x86/kvm/xen.c | 4 +-
include/uapi/linux/kvm.h | 3 +
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++++++
11 files changed, 600 insertions(+), 209 deletions(-)