Re: [PATCH RFC 1/1] KVM: x86: add param to update master clock periodically

From: Sean Christopherson
Date: Fri Oct 13 2023 - 15:02:53 EST


On Fri, Oct 13, 2023, David Woodhouse wrote:
> On Fri, 2023-10-13 at 11:07 -0700, Sean Christopherson wrote:
> > I generally support the idea, but I think it needs to an opt-in from userspace.
> > Essentially a "I pinky swear to give all vCPUs the same TSC frequency, to not
> > suspend the host, and to not run software/firmware that writes IA32_TSC_ADJUST".
> > AFAICT, there are too many edge cases and assumptions about userspace for KVM to
> > safely couple kvmclock to guest TSC by default.
>
> I think IA32_TSC_ADJUST is OK, isn't it? There is a "real" TSC value
> and if vCPUs adjust themselves forward and backwards from that, it's
> just handled as a delta.

I meant the host writing IA32_TSC_ADJUST. E.g. if a host SMM handler mucks with
TSC offsets to try and hide the time spent in the SMM handler, then the platform
owner gets to keep the pieces.

> And we solved 'give all vCPUS the same TSC frequency' by making that
> KVM-wide.
>
> Maybe suspending and resuming the host can be treated like live
> migration, where you know the host TSC is different so you have to make
> do with a delta based on CLOCK_TAI.
>
> But while I'm picking on the edge cases and suggesting that we *can*
> cope with some of them, I do agree with your suggestion that "let
> kvmclock run by itself without being clamped back to
> CLOCK_MONOTONIC_RAW" should be an opt *in* feature.

Yeah, I'm of the mind that just because we can cope with some edge cases, doesn't
mean we should. At this point, kvmclock really should be considered deprecated
on modern hardware. I.e. needs to be supported for older VMs, but shouldn't be
advertised/used when creating entirely new VMs.

Hence my desire to go with a low effort solution for getting kvmclock to play nice
with modern hardware.

> > > [1] Yes, I believe "back" does happen. I have test failures in my queue
> > > to look at, where guests see the "Xen" clock going backwards.
> >
> > Yeah, I assume "back" can happen based purely on the wierdness of the pvclock math.o
> >
> > What if we add a module param to disable KVM's TSC synchronization craziness
> > entirely?  If we first clean up the peroidic sync mess, then it seems like it'd
> > be relatively straightforward to let kill off all of the synchronization, including
> > the synchronization of kvmclock to the host's TSC-based CLOCK_MONOTONIC_RAW.
> >
> > Not intended to be a functional patch...
>
> Will stare harder at the actual patch when it isn't Friday night.
>
> In the meantime, I do think a KVM cap that the VMM opts into is better
> than a module param?

Hmm, yeah, I think a capability would be cleaner overall. Then KVM could return
-EINVAL instead of silently forcing synchronization if the platform conditions
aren't meant, e.g. if the TSC isn't constant or if the host timekeeping isn't
using TSC.

The interaction with kvmclock_periodic_sync might be a bit awkward, but that's
easy enough to solve with a wrapper.