Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: Sean Christopherson
Date: Fri Oct 13 2023 - 16:03:38 EST


On Fri, Oct 13, 2023, David Woodhouse wrote:
> On Fri, 2023-10-13 at 12:02 -0700, Sean Christopherson wrote:
> > > But while I'm picking on the edge cases and suggesting that we *can*
> > > cope with some of them, I do agree with your suggestion that "let
> > > kvmclock run by itself without being clamped back to
> > > CLOCK_MONOTONIC_RAW" should be an opt *in* feature.
> >
> > Yeah, I'm of the mind that just because we can cope with some edge cases, doesn't
> > mean we should.  At this point, kvmclock really should be considered deprecated
> > on modern hardware.  I.e. needs to be supported for older VMs, but shouldn't be
> > advertised/used when creating entirely new VMs.
> >
> > Hence my desire to go with a low effort solution for getting kvmclock to play nice
> > with modern hardware.
>
> Yeah... although the kvmclock is also the *Xen* clock (and the clock on
> which Xen timers are based). So while I'm perfectly prepared to call
> those Xen guests "older VMs", I do still have to launch quite a lot of
> new ones the same... :)

Heh, yeah, by "new" I meant "new shapes/classes/types of VMs", not simply "new
instances of existing VM types".

> > > > > [1] Yes, I believe "back" does happen. I have test failures in my queue
> > > > > to look at, where guests see the "Xen" clock going backwards.
> > > >
> > > > Yeah, I assume "back" can happen based purely on the wierdness of the pvclock math.o
> > > >
> > > > What if we add a module param to disable KVM's TSC synchronization craziness
> > > > entirely?  If we first clean up the peroidic sync mess, then it seems like it'd
> > > > be relatively straightforward to let kill off all of the synchronization, including
> > > > the synchronization of kvmclock to the host's TSC-based CLOCK_MONOTONIC_RAW.
> > > >
> > > > Not intended to be a functional patch...
> > >
> > > Will stare harder at the actual patch when it isn't Friday night.
> > >
> > > In the meantime, I do think a KVM cap that the VMM opts into is better
> > > than a module param?
> >
> > Hmm, yeah, I think a capability would be cleaner overall.  Then KVM could return
> > -EINVAL instead of silently forcing synchronization if the platform conditions
> > aren't meant, e.g. if the TSC isn't constant or if the host timekeeping isn't
> > using TSC.
>
> Right.
>
> > The interaction with kvmclock_periodic_sync might be a bit awkward, but that's
> > easy enough to solve with a wrapper.
>
> At least that's all per-KVM already. We do also still need to deal with
> the mess of having a single system-wide kvm_guest_has_master_clock and
> different KVMs explicitly setting that to 1 or 0, don't we?

Hmm, I think the simplest way to handle kvm_guest_has_master_clock would be to
have KVM check that the host clocksource is TSC-based when enabling the capability,
but define the behavior to be that once kvmclock is tied to the TSC, it's *always*
tied to the TSC, even if the host switches to a different clock source. Then VMs
for which kvmclock is tied to TSC can simply not set kvm_guest_has_master_clock
and be skipped by pvclock_gtod_update_fn().

Side topic, I have no idea why that thing is an atomic. It's just a flag that
tracks if at least one VM is using masterclock, and its only usage is to disable
all masterclocks if the host stops using TSC as the clocksource for whatever reason.
It really should just be a simple bool that's accessed with {READ,WRITE}_ONCE().