[PATCH 0/2] RFC: Precise TSC migration
From: Maxim Levitsky
Date: Mon Nov 30 2020 - 08:38:01 EST
Hi!
This is the first version of the work to make TSC migration more accurate,
as was defined by Paulo at:
https://www.spinics.net/lists/kvm/msg225525.html
I have a few thoughts about the kvm masterclock synchronization,
which relate to the Paulo's proposal that I implemented.
The idea of masterclock is that when the host TSC is synchronized
(or as kernel call it, stable), and the guest TSC is synchronized as well,
then we can base the kvmclock, on the same pair of
(host time in nsec, host tsc value), for all vCPUs.
This makes the random error in calculation of this value invariant
across vCPUS, and allows the guest to do kvmclock calculation in userspace
(vDSO) since kvmclock parameters are vCPU invariant.
To ensure that the guest tsc is synchronized we currently track host/guest tsc
writes, and enable the master clock only when roughly the same guest's TSC value
was written across all vCPUs.
Recently this was disabled by Paulo and I agree with this, because I think
that we indeed should only make the guest TSC synchronized by default
(including new hotplugged vCPUs) and not do any tsc synchronization beyond that.
(Trying to guess when the guest syncs the TSC can cause more harm that good).
Besides, Linux guests don't sync the TSC via IA32_TSC write,
but rather use IA32_TSC_ADJUST which currently doesn't participate
in the tsc sync heruistics.
And as far as I know, Linux guest is the primary (only?) user of the kvmclock.
I *do think* however that we should redefine KVM_CLOCK_TSC_STABLE
in the documentation to state that it only guarantees invariance if the guest
doesn't mess with its own TSC.
Also I think we should consider enabling the X86_FEATURE_TSC_RELIABLE
in the guest kernel, when kvm is detected to avoid the guest even from trying
to sync TSC on newly hotplugged vCPUs.
(The guest doesn't end up touching TSC_ADJUST usually, but it still might
in some cases due to scheduling of guest vCPUs)
(X86_FEATURE_TSC_RELIABLE short circuits tsc synchronization on CPU hotplug,
and TSC clocksource watchdog, and the later we might want to keep).
For host TSC writes, just as Paulo proposed we can still do the tsc sync,
unless the new code that I implemented is in use.
Few more random notes:
I have a weird feeling about using 'nsec since 1 January 1970'.
Common sense is telling me that a 64 bit value can hold about 580 years,
but still I see that it is more common to use timespec which is a (sec,nsec) pair.
I feel that 'kvm_get_walltime' that I added is a bit of a hack.
Some refactoring might improve things here.
For example making kvm_get_walltime_and_clockread work in non tsc case as well
might make the code cleaner.
Patches to enable this feature in qemu are in process of being sent to
qemu-devel mailing list.
Best regards,
Maxim Levitsky
Maxim Levitsky (2):
KVM: x86: implement KVM_SET_TSC_PRECISE/KVM_GET_TSC_PRECISE
KVM: x86: introduce KVM_X86_QUIRK_TSC_HOST_ACCESS
Documentation/virt/kvm/api.rst | 56 +++++++++++++++++++++
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/x86.c | 88 +++++++++++++++++++++++++++++++--
include/uapi/linux/kvm.h | 14 ++++++
4 files changed, 154 insertions(+), 5 deletions(-)
--
2.26.2