Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: Sean Christopherson
Date: Fri Oct 13 2023 - 19:26:53 EST


On Fri, Oct 13, 2023, Dongli Zhang wrote:
> On 10/13/23 11:07, Sean Christopherson wrote:
> > What if we add a module param to disable KVM's TSC synchronization craziness
> > entirely? If we first clean up the peroidic sync mess, then it seems like it'd
> > be relatively straightforward to let kill off all of the synchronization, including
> > the synchronization of kvmclock to the host's TSC-based CLOCK_MONOTONIC_RAW.
> >
> > Not intended to be a functional patch...
> >
> > ---
> > arch/x86/kvm/x86.c | 35 ++++++++++++++++++++++++++++++++---
> > 1 file changed, 32 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5b2104bdd99f..75fc6cbaef0d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -157,6 +157,9 @@ module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR);
> > static bool __read_mostly kvmclock_periodic_sync = true;
> > module_param(kvmclock_periodic_sync, bool, S_IRUGO);
> >
> > +static bool __read_mostly enable_tsc_sync = true;
> > +module_param_named(tsc_synchronization, enable_tsc_sync, bool, 0444);
> > +
> > /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */
> > static u32 __read_mostly tsc_tolerance_ppm = 250;
> > module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR);
> > @@ -2722,6 +2725,12 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
> > bool matched = false;
> > bool synchronizing = false;
> >
> > + if (!enable_tsc_sync) {
> > + offset = kvm_compute_l1_tsc_offset(vcpu, data);
> > + kvm_vcpu_write_tsc_offset(vcpu, offset);
> > + return;
> > + }
>
> TBH, I do not like this idea for two reasons.
>
> 1. As a very primary part of my work is to resolve kernel issue, when debugging
> any clock drift issue, it is really happy for me to see all vCPUs have the same
> vcpu->arch.tsc_offset in the coredump or vCPU debugfs.
>
> This patch may lead to that different vCPUs added at different time have
> different vcpu->arch.tsc_offset.

The expectation is that *if* userspace disables TSC synchronization, then userespace
would be responsible for setting the guest TSC offset directly. And disabling
TSC synchronization would be fully opt-in, i.e. we'd fix the existing masterclock
sync issues first.

> 2. Suppose the KVM host has been running for long time, and the drift between
> two domains would be accumulated to super large? (Even it may not introduce
> anything bad immediately)

That already happens today, e.g. unless the host does vCPU hotplug or is using
XEN's shared info page, masterclock updates effectively never happen. And I'm
not aware of a single bug report of someone complaining that kvmclock has drifted
from the host clock. The only bug reports we have are when KVM triggers an update
and causes time to jump from the guest's perspective.

If the guest needs accurate timekeeping, it's all but guaranteed to be using NTP
or an equivalent. I.e. the imprecision that's inherent in the pvclock ABI shouldn't
be problematic for any guest that actually cares.

> If the objective is to avoid masterclock updating, how about the below I copied
> from my prior diagnostic kernel to help debug this issue.
>
> The idea is to never update master clock, if tsc is stable (and masterclock is
> already used).

That's another option, but if there are no masterclock updates, then it suffers
the exact same (theoretical) problem as #2. And there are real downsides, e.g.
defining when KVM would synchronize kvmclock with the host clock would be
significantly harder, as we'd have to capture all the possible ways that KVM could
(temporarily) disable the masterclock and start synchronizing.