Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: David Woodhouse
Date: Fri Oct 13 2023 - 15:12:25 EST


On Fri, 2023-10-13 at 12:02 -0700, Sean Christopherson wrote:
> On Fri, Oct 13, 2023, David Woodhouse wrote:
> > On Fri, 2023-10-13 at 11:07 -0700, Sean Christopherson wrote:
> > > I generally support the idea, but I think it needs to an opt-in from userspace.
> > > Essentially a "I pinky swear to give all vCPUs the same TSC frequency, to not
> > > suspend the host, and to not run software/firmware that writes IA32_TSC_ADJUST".
> > > AFAICT, there are too many edge cases and assumptions about userspace for KVM to
> > > safely couple kvmclock to guest TSC by default.
> >
> > I think IA32_TSC_ADJUST is OK, isn't it? There is a "real" TSC value
> > and if vCPUs adjust themselves forward and backwards from that, it's
> > just handled as a delta.
>
> I meant the host writing IA32_TSC_ADJUST.  E.g. if a host SMM handler mucks with
> TSC offsets to try and hide the time spent in the SMM handler, then the platform
> owner gets to keep the pieces.

Oh $DEITY yes, absolutely.

> > And we solved 'give all vCPUS the same TSC frequency' by making that
> > KVM-wide.
> >
> > Maybe suspending and resuming the host can be treated like live
> > migration, where you know the host TSC is different so you have to make
> > do with a delta based on CLOCK_TAI.
> >
> > But while I'm picking on the edge cases and suggesting that we *can*
> > cope with some of them, I do agree with your suggestion that "let
> > kvmclock run by itself without being clamped back to
> > CLOCK_MONOTONIC_RAW" should be an opt *in* feature.
>
> Yeah, I'm of the mind that just because we can cope with some edge cases, doesn't
> mean we should.  At this point, kvmclock really should be considered deprecated
> on modern hardware.  I.e. needs to be supported for older VMs, but shouldn't be
> advertised/used when creating entirely new VMs.
>
> Hence my desire to go with a low effort solution for getting kvmclock to play nice
> with modern hardware.

Yeah... although the kvmclock is also the *Xen* clock (and the clock on
which Xen timers are based). So while I'm perfectly prepared to call
those Xen guests "older VMs", I do still have to launch quite a lot of
new ones the same... :)


> > > > [1] Yes, I believe "back" does happen. I have test failures in my queue
> > > > to look at, where guests see the "Xen" clock going backwards.
> > >
> > > Yeah, I assume "back" can happen based purely on the wierdness of the pvclock math.o
> > >
> > > What if we add a module param to disable KVM's TSC synchronization craziness
> > > entirely?  If we first clean up the peroidic sync mess, then it seems like it'd
> > > be relatively straightforward to let kill off all of the synchronization, including
> > > the synchronization of kvmclock to the host's TSC-based CLOCK_MONOTONIC_RAW.
> > >
> > > Not intended to be a functional patch...
> >
> > Will stare harder at the actual patch when it isn't Friday night.
> >
> > In the meantime, I do think a KVM cap that the VMM opts into is better
> > than a module param?
>
> Hmm, yeah, I think a capability would be cleaner overall.  Then KVM could return
> -EINVAL instead of silently forcing synchronization if the platform conditions
> aren't meant, e.g. if the TSC isn't constant or if the host timekeeping isn't
> using TSC.

Right.

> The interaction with kvmclock_periodic_sync might be a bit awkward, but that's
> easy enough to solve with a wrapper.

At least that's all per-KVM already. We do also still need to deal with
the mess of having a single system-wide kvm_guest_has_master_clock and
different KVMs explicitly setting that to 1 or 0, don't we?

Attachment: smime.p7s
Description: S/MIME cryptographic signature